In addition to the research’s primary objectives, which are derived from autonomous observational exploration, this model could also allow for more directed, intervening and controlled exploration, one which is encouraged through reinforcement learning.

Based on Pavlov’s work, B.F Skinner showed that S-R links can be re-enforced by pleasant or unpleasant environmental consequences, and that, “…behavior is principally controlled by schedules of reinforcement”, where a positive re-enforcement was the delivery of a physical reward, while negative re-enforcement was not getting a reward or the avoidance of unpleasant consequences/outcomes/reactions (Morgan, 2010).

This showed that very complex forms of learning could be built up by reinforcing to produce small changes in behaviour, and that, “…encoding and automatization of such associations are crucial for the achievement of more complex interactions within our environment” (Allenmark, Moutsopoulou and Waszak, 2015). This concept is similar to how machine learning works through evolving and testing variations.

It might then be possible, through controllable simulation of stimuli within the virtual environment, to influence the collected S-R links. This might encourage certain types of associations to form, which might help control the formation of knowledge in specific simulations, perhaps to understand how this might affect the learning and teaching of an artificial entity.

D.O. Hebb suggested that learning occurs through “repeated stimulation” (Ghassemzadeh, Posner and Rothbart, 2013), where the connection to this knowledge theoretically grows larger and is then more accessible (and is then more evident in resultant behaviour).

Whether through self-exploration or controlled exploration, a repeated stimulus and its response could be rewarded either directly by an experimenter, or indirectly and dynamically in self-exploration (e.g perhaps based on situational frequency).

Through simulation, one could emulate such rewarded responses by possibly increasing the count or value of that S-R link, i.e assign a higher value to the association. This could be used to help build a picture to determine if a produced reaction is right or more or less appropriate, particularly where behaviour or knowledge for the stimulus is not yet established.

In this way, it might be possible to establish or determine that a response contributes towards a goal, and then collecting S-R links that contribute towards that goal could form the basis for learning how to behave or what to do to achieve a goal. This is not itself novel, as multi-agent simulations often use this approach in adapting to maximise their progress towards reaching a goal. This, however, could establish a basis for simulating goal-oriented learning (or test such models in a virtual environment), which might help to gather observations relevant to understanding the nature of behaviour in the face of changing circumstances that affect those needs or goals. Indeed, detecting change or assessing the severity or influence of changing circumstances could be novel could generalise and detect it properly.

An agent might utilise both directed (reinforcement) and self-exploration as a means of evaluating new stimuli for the purpose of evaluating resultant responses against its needs or goals. While not novel, it is useful.

An autonomous agent might simulate self-exploration by trying to perform different actions (e.g., to self-experiment to understand new causality, events and relationships that form as a result), which are likely informed by previous observations and learning, while a directed approach (reinforcement) would likely influence produced responses (or aim to witness them) by explicitly and prescriptively rewarding behaviour.

In autonomous exploration, it might be possible (and novel) to derive goals by grouping responses that satisfy a need/goal, and in this way, it may be possible that agents could tell what it takes to achieve goals, similarly to how back-propagation aims to reduce the error function, by rewarding S-R links that tend towards realising a goal.

Autonomous exploration would rely on gradually realising favourable S-R links through multiple experiences, while directed exploration and explicit rewards would identify favourable S-R links more expediently; however, this would make the training biased on a specific S-R link that the experimenter felt was relevant (introducing bias), instead of it being derived from multiple variances experienced over time by an autonomous agent.

The implication of this goal-oriented or self-rewarding behaviour is possibly the ability to document which behaviours contribute towards achieving a goal or tendency. This could be very useful for understanding why patterns form, and not merely that they do, and specifically which determined behaviours or observations are specifically relevant.

References

Morgan, D. L. (2010) ‘Schedules of Reinforcement at 50: A Retrospective Appreciation’, The Psychological record, 60(1), pp. 151–172. doi: 10.1007/BF03395699.

Allenmark, F., Moutsopoulou, K. and Waszak, F. (2015) ‘A new look on S–R associations: How S and R link’, Acta Psychologica, 160, pp. 161–169. doi: 10.1016/j.actpsy.2015.07.016.

Ghassemzadeh, H., Posner, M. I. and Rothbart, M. K. (2013) ‘Contributions of Hebb and Vygotsky to an Integrated Science of Mind’, Journal of the history of the neurosciences, 22(3), pp. 292–306. doi: 10.1080/0964704X.2012.761071.