AI: The Human Simulation

A recent headline regarding ChatGPT and its ability to remember everything previously said to it by a user got me thinking about how the evolution of Large Language Models (LLMs) has progressed to the point where with enough contextual information to both train and feed a model, it may be possible to replicate the conscious thought of a human with enough confidence of what comes next for it to be useful (or nefarious). Why would this be useful? Well, for a few different scenarios:

A personal assistant that can replicate you (with a high degree of certainty)
A 'personality clone' for law-enforcement to determine the probability of a person taking action / have taking action
A model for scammers to use to simulate a person (including common passwords or passphrases)

All of the above likely sound far-fetched, but lets look at the basics of how this would work in today's world.

When training a LLM you must provide it with vast amounts of weighted information that will be used as a reference. Typically this is from high-quality sources (your peer-reviewed scientific journals / encyclopedias for example), providing solid base information that should be factually accurate. Obviously with the volumes of information required the quality of data begins to drop as different sources are used, however this is still data that we as humans would still have access to / would potentially learn from (so the impact isn't as significant as you may think).

On top of the above, depending on the type of LLM you can start adding specifics (think of this somewhat like a persona). In reality its more aligned to a set of restrictions that an AI prompt should follow (i.e., help you with your maths homework rather than providing cooking recipes), and is subject to many attempts of subversion (to reveal more information about the data it was trained on). For the purposes of this article, the restrictions would be scoped to emulating the persona of a specific person.

Back to the training data, when attempting to replicate a persona you need detailed information on the persona itself. Rolling the clock back 40 years this would have been incredibly difficult (even for the intelligence services) as paper records were the standard and the information stored (at least publicly) on a person was limited at best. Now, our digitally collected lives create a wealth of information (undoubtedly used by said intelligence services) that covers everything from the media we watch, the news that we read, the information that we hear, the food/drink we consume, and our positive / negative reactions to everything.

Our cloud-stored photographs cover where we have been / who we have been there with / if we appear to have enjoyed it (emotion recognition in photographs has been around for many years, and has been used by law enforcement for some time). Our bank statements detail our income (and where we spend it), and how close to the wire we may live each month (indicating our fiscal priorities). Our medical records are digitised and provide information into our health (or lack of), further providing insight into a persona. Even our high-school records (depending on age) are digitised, covering who we were / how we acted while growing up.

Even our digital messages (depending on the platform) are scanned by AI unless you opt-out (and even then, they are likely scanned if the receiving party hasn't opted-out), covering conversations ranging from the silly to the serious. Our audio/video calls (again, depending on the platform) are also analysed both for speech and for facial reactions (which translates to emotional response again). The modern vehicles we drive report the routes that we take / our driving behaviour / the speed of our journeys.

With all of this information (and the many thousands of sources I haven't listed here), it would be easy to think/say "This is too much information to piece together", and if it was a task being handled by a human you would likely be right. However, this task isn't being handled by a human, its being handled by a computer that can think in many more dimensions than we can. Years ago I worked with a (somewhat useful) product that provided alerts on user behaviour (from a security perspective), which for all of its faults did on occasion work (and work well). Despite the mystery box under the covers (related to how the Machine Learning worked), the basic premise was that you would feed it significant amounts of security data (ranging from users connecting to a system to the commands they type on a keyboard) and it would compare all of this against other users, but considering as many different factors as you could feed it (i.e., their home country, their work role, their working hours, their location for each event, the work groups they are a part of, the behaviour of other employees who align with one / many / all of the same markers).

The big difference that this type of approach made (and why it became easy to explain) is that for a human to try and write the comparative logic for the above would be near-impossible (as with the quantity of dimensions you would need to cover you would need to write pages upon pages of detection logic that would become unmanageable). With Machine Learning, it can handle the comparisons of thousands of unique data-points / dimensions with relative ease, and is only limited by the processing power you can provide it, and the data you can feed it. This is why our digital footprint makes this type of technology interesting, as it provides the capability to categorise / chart our behaviour at a level that as humans alone we couldn't manage.

So looping back to the three possible use-cases for this technology if fed enough data on a persona, how does this actually provide value (to us or to others). If you consider a personal assistant (the least scary on the list), having an AI that thinks the way you do and can make decisions you would agree with the majority of the time could be considered of value. Imagine a bad day at work where the AI has observed your behaviour / speech / actions, processed them and adapted its own model (by remembering all previous conversations, even the non-verbal), and then decides that based on your social activity you either need to visit friends / family (and starts the conversation for you), or that you need your favourite takeaway ordered (and before you know it the delivery driver is at your door). Both sound positive, however it could also be considered double-edged as neither of them may be the solution to an underlying problem that isn't understood (in this example, workplace harassment that isn't captured and an eating disorder). There should also be concern over what training data made swayed the final decision to order takeaway rather than seeking help, as while this could be a simple decision made out of past habits and preferences, it could also be related to sponsors of a model (as training it isn't free), with what would become a new form of targeted advertising.

From a law-enforcement perspective we again arrive at technology that could be very beneficial while also being incredibly scary. Imagine the intelligence services creating a virtual persona of you that is trained on every piece of information that has ever been gathered about you, that is used to determine the likelihood of you being response for a present-day event. On the one hand, if the persona is incredibly accurate (and is configured not to lie / hallucinate), it might provide rationale as to why you wouldn't have been involved (removing the person-of-interest flag). On the other hand, if the training data is inaccurate / flawed the persona might state you would have no issue being involved in said activity, resulting in a free detainment (even when you had no involvement or awareness). Adding to this, what if the personas aren't used for present-day activities but are used to model how the coming 12 months would go. Multiple personas (replicating your social structure) could be run simultaneously and fed likely information on current affairs etc, to see how your thoughts may change. Considering recent political events, your virtual persona might object to simulated events and be deemed a threat to national security (at which point deportation or disappearance may occur). All of this based on the output of something you weren't aware might exist / have no visibility of / may have been trained badly.

From a scammer perspective, the use of AI to replicate people isn't anything new, but isn't (yet) at the described level. Voice replication is already commonplace and in most cases only takes a few audio clips to simulate a somewhat lifelike version of a person. Deepfakes became the evolution of this, using video footage and being able to not only transplant somebodies face onto the body in another video, but now, to generate a completely fake video of someone based on sample footage of them and a described plot / script. Taking this further (though still currently in its infancy) is training a LLM based on the conversational history of a user (be it their instant messaging / emails / social media posts), to the point of being able to simulate their mannerisms in conversation. An example of how this would be leveraged is to have a friend message you stating how a bad event had occurred and while trying to pick up the pieces they really needed financial assistance. Their spelling / grammar would look no different than usual, nor would their pauses between responses (and the responses themselves), setting off no alarm bells when you read the conversation over and over. As it looks so genuine the funds are sent to the scammer, as even a short telephone call to confirm everything sounded genuine (thanks to the AI being able to synthesise their voice accurately).

An interesting take on all of the above (and there are many), is how this evolution of LLMs / AIs could reach the point where it does threaten peoples jobs (at least those who work digitally). Despite the hype from the press, at present the standard AI assistants that exist can be thought of as (somewhat) intuitive autocompletes. In some scenarios they can be very helpful, in others they provide poor answers that make things worse. The view of completely replacing a person with one is risky and in most cases will result in bad things happening. While things are improving, the direction of travel (aligned with this article) may be an important factor. Instead of a few mainstream AIs that have lots of knowledge (and apply them in the same way for the most part), replicating the personas of your top performers (or an amalgamation of all of them), and then running them in an environment where they perform checks against each other (and learn from these interactions) might produce better results (as it mimics the human element). You also gain the ability to see how the combination of models perform when working with each other, and can then switch out a model if it isn't a 'team player'. As each model emulates the persona it was based on, the style of its output may align more with expectations (even to a simple level of expected code documentation).

The future of AI will be interesting to say the least, and I'm genuinely curious (and concerned) to see what direction it goes in / how it gets used in the long-term.