How #AI tools like Copilot or Gemini actually work

A practical summary for educators - with a few techie steps omitted to keep it simple.

1) The big picture

Think of Copilot or Gemini as very advanced “autocomplete”. Instead of predicting the next letter in a word, they predict the next bit of language in a sentence, a paragraph, or a plan. They have read a huge amount of text and learned patterns of how words and ideas tend to go together. When you ask a question, the tool does two jobs:

Understand your request.
Generate the most likely and helpful response, one small piece at a time.

They are not looking up a single page on the internet. They are using what they have statistically learned about language to produce a fresh answer. They can also combine this with live tools like web search or your organisation’s documents if that feature is turned on.

2) What is a Large Language Model (LLM)?

A Large Language Model is a computer program trained to spot patterns in text. During training, it is shown many examples of text and asked to guess the next token. Over and over, it gets a tiny nudge to be a bit more accurate next time. After months of this, it becomes very good at writing and explaining.

Tokens - the model’s building blocks

LLMs do not see full words. They break text into small chunks called tokens.

“Classroom” might become “class” + “room”.
“Assessment” might be “assess” + “ment”. Working with tokens makes processing faster and consistent across languages.

Context window - short-term memory

The model can only “see” a certain number of tokens at once. This is the context window, like a short-term memory. If your conversation or document is longer than that window, earlier parts fall out of view unless you summarise or re-include them. Bigger models often have larger windows, which helps with longer tasks.

3) How an answer is chosen

The model generates text one token at a time. For each next token, it produces a list of possibilities with probabilities. A few dials change how this feels:

Temperature controls creativity.
Top-p or top-k limits how many of the most likely tokens are considered. This reduces waffling.
System and style instructions set tone and boundaries. For example, “be concise and use UK English”.

Put simply: it predicts what is most likely to be helpful next, within the rules you and the provider set.

4) Where the knowledge comes from

During training, the model consumes a large, mixed dataset: books, websites, articles, code, and sometimes licensed sources or curated data. It does not store exact copies of everything, but it does internalise patterns. That is why it can write about many topics without needing to fetch a specific web page.

Some deployments add Retrieval-Augmented Generation (RAG). That means the model can look up documents from a trusted source (your staff handbook, policy library, curriculum plans), pull out relevant snippets, and then write an answer that cites those sources. This is ideal in schools because it anchors responses to your policies rather than the general internet.

5) Why do these tools sometimes “make things up”?

Educators call it “hallucination”; others might call it “confident waffle”. The model always tries to complete an answer. If the prompt is vague or the model lacks facts, it may guess. Common causes:

The question is ambiguous.
You asked for a very specific reference it has not seen.
Be specific, ask it to show sources, and point it at your documents via RAG when possible.

6) Bias - what it is, why it happens, and what to do

What is bias here? Bias is when answers consistently lean in a particular direction that disadvantages people or misrepresents topics.

Why it happens:

Biased training data - if the internet over-represents certain voices, the model can mirror that, likewise under-representing some cultures, gender and more.
Historical patterns - the model learns what appears most often, not what is most fair.
Prompt framing - the way a question is asked can steer the model.
Safety rules - alignment steps that try to keep outputs safe can also create different unintentional skews.

What you can do in education:

Ask for multiple perspectives: “Give 3 viewpoints with pros and cons.”
Require source-anchored outputs: “Use DfE guidance and cite it.”
Use checklists: “Before finalising, check for inclusive language and UK context.”
Compare drafts with different prompts and choose the fairest result.
Keep people in the loop - professional judgement remains essential.

7) Safety layers and alignment

Modern tools have guardrails: they filter harmful requests, reduce unsafe content, and follow provider policies. They may refuse to answer certain prompts. There are also enterprise controls so your IT team can set boundaries, log usage, and protect data. Alignment is imperfect but improving, so staff judgement and school policy are still key....and likely always to be. The new popularity of AI companions presents significant potential risks to children who interact with them as if they were humans, and where the AI does not have sufficient, credible protections.

8) Privacy and safeguarding essentials

Treat the model like any third-party service. Do not paste personally identifiable information about pupils, staff, or families unless your licence and policy explicitly allow it.
Prefer an education or enterprise version managed by your organisation.
Switch on features that keep data within your tenant and disable training on your prompts if required.
For pupil use, start with narrow tasks and supervised environments.
Always follow your data protection and safeguarding policies.

9) Getting better answers - the educator’s “prompt craft”

Think of a prompt as a mini brief.

Structure that works:

Role and goal - “You are a Key Stage 2 literacy coach. Produce a 30-minute guided reading plan.”
Inputs - paste the text extract or success criteria.
Constraints - UK English, 30 minutes, accessible language, stretch task for high attainers.
Output format - headings, bullet list, table, or checklist.
Quality check - “List any assumptions. Flag where evidence is weak.”

Examples

“Give 3 options at different challenge levels.”
“Cite the DfE guidance you used and link it.”
“Summarise in 150 words for a parent newsletter.”

10) Common classroom use cases

Draft a first-pass lesson outline and then refine it yourself.
Generate retrieval questions from your own text.
Rephrase a policy for parents in plain English.
Produce varied exemplars at different grades.
Create a marking rubric, then tune the criteria to your standards.
Turn a transcript into meeting minutes and action items.
Translate a short message for families, then double-check with a native speaker or trusted translator where accuracy is critical.

11) Limits to remember

It does not “understand” like a person - it recognises patterns. You provide the judgement.
Facts can drift - use retrieval from trusted sources for anything high-stakes.
It cannot replace teacher-pupil relationships or professional ethics.
Long tasks can exceed the context window - summarise or chunk.
Maths and data tables can trip it up - ask for step-by-step working and verify.

12) Under the bonnet - a simple mental model

Neural networks - layers of maths that learn to map tokens to likely next tokens.
Training - guess the next token, compare to the real one, adjust. Repeat billions of times.
Fine-tuning - an extra pass on targeted data, such as education content.
Reinforcement learning from human feedback - humans rate outputs, nudging the model to be more helpful and safe.
Tools and plugins - the model can call a calculator, a web search, or your document store to improve accuracy, then write the final answer in natural language.

13) Copilot vs Gemini - what differs for schools

Both are LLM-powered assistants. The main differences are usually:

Ecosystem integration - Copilot sits naturally with Microsoft 365 products; Gemini with Google Workspace.
Data controls - your IT team can configure tenancy, logging, and compliance in each ecosystem.
Tools - each has add-ons and connectors to fetch data or automate tasks. For day-to-day teaching tasks, the prompting habits above matter more than the brand you choose. Pick the tool that fits your school’s platform, licences, and data protection approach.

14) A quick bias and safety checklist for staff

Did I ask for multiple perspectives, not just one answer
Did I ask the tool to cite or anchor to our policies
Did I remove pupil or staff personal data
Did I check for inclusive language and local context
Did I verify any critical facts against a trusted source
Will I apply teacher judgement before sharing with pupils or families

15) A 60-second explainer you can read out loud

“LLMs like Copilot or Gemini are pattern finders. They break text into tokens, predict what comes next, and generate a response one small piece at a time. They are trained on vast text so they can write, summarise, and explain. They do not think like humans and they sometimes guess, so we keep them grounded by pointing them at our own documents, asking for sources, and checking bias. When used with clear prompts, school policies, and teacher judgment, in the right context, they can be powerful assistants that save time and help us communicate more effectively.”

Quick glossary

LLM - a model trained to predict the next token in text.
Token - a chunk of text smaller than a word.
Context window - how much text the model can “see” at once.
Temperature - how creative or conservative the output is.
RAG - Retrieval-Augmented Generation, where the model looks up trusted documents first.
Hallucination - a confident but inaccurate answer.
Alignment - safety rules and training so the model behaves helpfully.

Hope that all helps. Al Kingsley

Al Kingsley MBE

How #AI tools like Copilot or Gemini actually work - for non-techie Educators ;)