Data nullius: why the AI playbook is straight from the era of colonial empires

Data Nullius: The AI Playbook, A Colonial Echo

The morning light usually painted my studio in hues of possibility, but lately, a different shadow has fallen across my canvas.

For years, I’ve poured my heart into capturing the fleeting beauty of nature, translating the whisper of leaves and the grandeur of mountains into vivid digital art.

Each pixel, each brushstroke, a piece of my soul.

I’ve shared my work online, believing in the spirit of a connected world, a vibrant global gallery.

But now, there’s an unsettling murmur growing louder: my art, my carefully crafted expressions, might be silently fueling something else entirely—training data for an AI, without my permission, without a penny.

It’s a strange, quiet theft, like discovering your own backyard, once brimming with personal stories, has been declared an unclaimed wilderness.

Big AI companies are treating internet data as unclaimed, mirroring historical colonial land grabs.

This digital colonialism uses fair use and bundled consent to exploit data, but First Nations’ resistance offers a powerful blueprint for data sovereignty.

Why This Matters Now: Beyond the Pixels

This isn’t just about an artist’s personal feelings or a single creator’s loss.

It’s about the very foundations of our digital future and the ethical scaffolding (or lack thereof) underpinning the rapidly evolving world of Artificial Intelligence.

When powerful tech giants scrape vast troves of data – photos, videos, books, blog posts, and more – to train their AI models, usually without compensation or explicit consent, it creates a silent, systemic imbalance.

This practice raises profound ethical and legal questions regarding intellectual property and data ownership, benefiting a few at the expense of many (The Conversation article).

It’s not merely a technical challenge; it’s a fundamental test of our societal values.

Fair Use or Digital Theft? The Scramble for Data

Let’s talk plainly about what’s happening.

Big AI companies, such as OpenAI and Google, view the internet’s boundless data as an immensely valuable, essentially free resource.

They are actively collecting this material to train their products, like ChatGPT, often without offering any payment or even seeking permission from the original creators.

What’s their justification?

They argue that a specific aspect of American copyright law, the “fair use doctrine,” legitimizes this data acquisition (The Conversation article).

This stance introduces a counterintuitive insight: what is legally termed “fair use” can, in practice, feel remarkably like digital theft to the people whose work is being ingested.

The paradox sharpens when OpenAI itself has reportedly accused other AI developers of scraping its own intellectual property.

It begs the question: is “fair use” a universally applicable principle, or a convenient legal shield deployed opportunistically?

The core problem here isn’t just about copyright; it’s about a foundational assumption that digital data, once online, becomes communal property, ripe for the taking.

This assumption overlooks the labor, creativity, and unique cultural context embedded within that data.

The Echo of Terra Nullius: No One’s Digital Land

First Nations communities globally observe these developments with a profound sense of familiarity.

Long before the digital age, their lands, their peoples, and their invaluable knowledges were subjected to strikingly similar treatment: exploited by colonial powers for self-serving gain (The Conversation article).

The concept of “terra nullius,” a Latin term meaning “no one’s land” or “land belonging to no one,” was historically weaponized by colonizers.

It served as a legal fiction to claim territories, particularly in Australia, effectively erasing the pre-existing sovereignty and connection of Indigenous peoples to their ancestral lands (The Conversation article).

This legal fiction of terra nullius in Australia was famously overturned by the landmark 1992 Mabo case.

This pivotal moment recognized the land rights of the Meriam peoples of the Murray Islands and affirmed the ongoing connection of First Nations peoples to their land in Australia (Mabo case, 1992).

The case legally dismantled terra nullius, paving the way for the Native Title Act 1993 (The Conversation article).

However, the unsettling echoes of “terra nullius” reverberate in the modern digital sphere.

When AI companies indiscriminately scrape billions of people’s data from the internet, it’s as if they operate under the belief that this data belongs to no one.

It’s akin to the historical, incorrect belief that the continent of Australia was an empty land, ready for the taking.

This mental model of data as a boundless, unowned resource is the bedrock of what is now being termed digital colonialism.

What the Research Really Says: Unpacking Digital Colonialism

The current dynamics between powerful (mostly Western) tech giants and the vast digital commons are deeply concerning.

Research highlights how this interaction constitutes a form of digital colonialism, where algorithms, data, and digital technologies are leveraged to exert power and extract data without explicit consent (The Conversation article).

This isn’t just about overt scraping; it also manifests in more insidious ways.

Insights for Ethical AI Operations:

AI companies’ invocation of the ‘fair use doctrine’ to justify data scraping mirrors historical colonial justifications for resource exploitation under ‘terra nullius’.
This necessitates a re-evaluation of current legal frameworks surrounding data usage to prevent further exploitation and ensure creators’ rights are protected.

Businesses developing AI must rigorously scrutinize their data acquisition strategies, moving beyond mere legal compliance to embrace ethical sourcing and compensation.
The long history of First Nations resistance against colonialism provides a valuable blueprint for contemporary movements challenging ‘digital terra nullius’ and asserting data sovereignty’.
Drawing lessons from these struggles can empower communities to develop effective strategies for collective negotiation and resistance against digital exploitation.

Companies should engage with Indigenous data sovereignty movements and other community-led initiatives to understand and respect community-governed data, fostering trust and avoiding future ethical pitfalls.
‘Bundled consent’ mechanisms frequently present a Hobson’s choice, coercing individuals into relinquishing data to access essential services like banking or healthcare.
Policymakers and regulators must ensure genuine, granular consent mechanisms are in place to protect individuals from coerced data relinquishment and digital exclusion.

Businesses should champion user-centric consent models that provide genuine choices, moving away from all-or-nothing agreements that erode user autonomy and trust.

The Coercion of Consent: Hobson’s Choice in the Digital Age

Beyond direct scraping, digital colonialism often cloaks itself in the guise of “consent.”

We encounter it daily: that mandatory “accept all” button after a phone update, or to simply access your online bank account.

This is what we call a Hobson’s choice: while it appears you have options, the reality is you only have one true path – to “agree.”

Imagine the consequences if you chose to reject this bundled consent.

You might find yourself locked out of essential services, unable to bank, use your primary communication device, or even access critical healthcare.

What seems like a choice to protect your intellectual property and privacy quickly becomes a choice for social exclusion.

This isn’t a new tactic.

Just as “terra nullius” was a colonial strategy to claim resources, Hobson’s choices are historically implemented as a means of assimilation, pressuring individuals to conform to dominant norms.

Resisting Digital Colonialism: Lessons from First Nations Sovereignty

So, is assimilation, or silent acceptance, our only path forward?

Absolutely not.

The generations of First Nations resistance offer a powerful testament to the many ways to fight for sovereignty and survive.

Since colonial invasion, First Nations communities have steadfastly asserted, “it always was and always will be Aboriginal land” (The Conversation article).

Their survival proclamations and protests provide invaluable direction, as demonstrated by the Mabo case, for challenging and transforming legal doctrines that are used to claim knowledge.

The rise of First Nations data sovereignty movements offers a clear path forward.

These movements advocate for data to be owned and governed by local communities.

Within this framework, communities retain the agency to decide what, when, and how their data is used, and critically, the right to refuse its use at any point (The Conversation article).

This model proposes a future where “continuity of consent” is paramount: data would reside primarily on individuals’ or communities’ devices, requiring companies to explicitly request access each time they wish to use it (The Conversation article).

Community-governed changes to data consent processes and legislation would empower communities – whether defined by culture, geography, jurisdiction, or shared interest – to collectively negotiate ongoing access to their data.

This approach would ensure our data is no longer treated as a digital terra nullius, forcing AI companies to demonstrate through action that data truly belongs to the people.

Playbook You Can Use Today: Building an Ethical AI Ecosystem

Navigating this complex landscape requires a proactive, principled approach.

Here’s a playbook for businesses, creators, and individuals seeking to champion ethical data practices:

Advocate for Transparent Data Sourcing: Demand and implement full transparency regarding how AI models are trained.
This means knowing the provenance of data, ensuring it wasn’t acquired without explicit consent or fair compensation (The Conversation article).
Challenge ‘Bundled Consent’ Models: As consumers, refuse to accept terms that offer a Hobson’s choice.
As businesses, design consent mechanisms that are granular, clear, and genuinely empower users with choice over their data.

This directly addresses the coercive practices of bundled consent (The Conversation article).
Support Data Sovereignty Initiatives: Engage with and learn from First Nations data sovereignty movements.
Their historical fight for self-determination over land and knowledge offers a robust framework for digital self-determination (The Conversation article, Mabo case, 1992).
Implement ‘Continuity of Consent’ Principles: Explore and push for systems where data remains on individual or community devices, requiring explicit, renewed consent for each use.
This ensures continuous agency and prevents data from becoming a digital terra nullius (The Conversation article).
Educate Your Ecosystem: Foster a culture of ethical data handling within your organization.
Educate employees, partners, and even customers about the implications of data scraping and the importance of data rights.
Redefine ‘Fair Use’ for the AI Era: Actively participate in legal and policy discussions to redefine what constitutes ‘fair use’ in the context of AI training data.
Current interpretations are being challenged, and businesses have a role in shaping future ethical standards (The Conversation article).
Prioritize Community-Governed Data Models: Collaborate with communities (cultural, geographic, or shared interest) to build models where data access and usage are collectively negotiated and controlled.
This shifts power dynamics and affirms that data truly belongs to the people.

Risks, Trade-offs, and Ethics in the Digital Frontier

While the call for data sovereignty and ethical data practices is clear, implementing these changes comes with its own set of considerations.

The primary risk for businesses lies in potential legal challenges and significant reputational damage if they fail to address these concerns proactively.

There’s also the operational trade-off of increased complexity in data acquisition and management, potentially slowing down AI development.

However, the ethical imperative far outweighs these challenges.

Mitigations include: proactive ethical frameworks to develop robust internal guidelines for data sourcing and consent that go beyond minimum legal requirements; genuine consent mechanisms to invest in user-friendly consent platforms that offer true choice and transparency; and community engagement to build partnerships with data-owning communities, fostering trust and collaborative innovation.

The choice is not between innovation and ethics, but how to innovate ethically.

Tools, Metrics, and Cadence for Data Governance

To effectively implement these principles, organizations need the right infrastructure and oversight.

Suggested Stack:

Consent Management Platforms (CMPs), such as OneTrust or Cookiebot, manage user consent effectively and transparently, moving beyond basic ‘accept all’ options.
Data Provenance Tracking Solutions are technologies that record the origin and journey of every piece of data, ensuring ethical sourcing and auditable trails.
Privacy-Enhancing Technologies (PETs) are tools that allow data analysis while preserving individual privacy, such as differential privacy or federated learning.

Key Performance Indicators (KPIs):

Granular Consent Opt-in Rate measures the percentage of users who provide specific consent for different data uses, indicating genuine engagement.
Data Provenance Score is a quantifiable metric tracking the percentage of data used in AI models that has a clear, ethically sourced, and documented origin.
Community Partnership Success Rate tracks the number and impact of collaborative projects with community-led data initiatives.
User Trust Index develops a survey-based index to measure user confidence in your data handling practices.
Ethical Audit Score involves regular, independent audits of data practices against internal ethical frameworks and external standards.

Review Cadence:

Monthly review data provenance reports and consent acquisition metrics.
Quarterly conduct comprehensive ethical audits of AI model training data and consent management.
Annually publish transparency reports on data practices and engage with community stakeholders for feedback and continuous improvement.

FAQ

What is ‘digital colonialism’ in the context of AI? Digital colonialism refers to powerful tech giants using algorithms, data, and digital technologies to exert control and extract data from others without genuine consent, mirroring historical colonial exploitation of land and peoples for their own benefit (The Conversation article).
How does the historical concept of ‘terra nullius’ relate to AI data practices? AI companies’ practice of scraping internet data without compensation or consent is likened to ‘terra nullius’ (no one’s land), implying they treat this data as if it belongs to no one, similar to how colonizers claimed land considered ’empty’ or ‘unowned’ (The Conversation article).
What is ‘bundled consent’ and why is it problematic for data ownership? ‘Bundled consent’ occurs when users are forced to accept all terms (often for data sharing) to access essential services like banking or phone updates, presenting a ‘Hobson’s choice’ where rejecting means social exclusion, thereby undermining true consent and facilitating data exploitation (The Conversation article).

Glossary

Digital Colonialism: The exercise of power over digital resources and data by powerful entities, reminiscent of historical colonial practices.
Terra Nullius: Latin for “no one’s land,” a historical legal concept used to justify colonial claims over inhabited lands.
Data Sovereignty: The right of a nation, people, or community to govern its own data, including its collection, storage, and usage.
Bundled Consent: A type of digital consent where users must accept all terms to access a service, offering no granular control over data sharing.
Hobson’s Choice: An apparently free choice where there is only one option, or where the refusal of the only option leads to an undesirable outcome.
Fair Use Doctrine: A provision in copyright law that permits limited use of copyrighted material without acquiring permission from the rights holder, for purposes such as criticism, news reporting, teaching, scholarship, or research.
Intellectual Property: Creations of the mind, such as inventions; literary and artistic works; designs; and symbols, names and images used in commerce.

Conclusion

The digital frontier, for all its promise, bears a striking resemblance to past colonial expansions.

My art, your photos, our collective digital footprints are not an empty wilderness, but a landscape rich with individual and cultural significance.

The choice between silent assimilation and active resistance defines our moment.

Just as Pemulwuy and other First Nations warriors demonstrated the many ways to resist seemingly all-powerful colonial empires, we too have agency in this new era.

By understanding the echoes of “data nullius,” embracing data sovereignty, and demanding genuine consent, we can collectively ensure that the future of AI is built on principles of equity, respect, and true ownership.

It is time to affirm, through action, that digital data belongs to the people, not to an imagined no-man’s-land.

Let’s build that future, together.

References

The Conversation.
“Data nullius: why the AI playbook is straight from the era of colonial empires.”
Australian High Court.
“Mabo case.”

1992.

Author:

Business & Marketing Coach, life caoch Leadership Consultant.