Hello there,
Are you interested of funding this theory of mine that I submitted to AI alignment awards? I am able to make this work in GPT2 and now writing the results. I was able to make GPT2 shutdown itself (100% of the time) even if it's aware of the shutdown instruction called "the Gauntlet" embedded through fine-tuning an artificially generated archetype called "the Guardian" essentially solving corrigibility, outer and inner alignment. https://twitter.com/whitehatStoic/status/1645758144537034752?t=ps-Ccu42tcScTmWg1qYuqA&s=19
Let me know if you guys are interested. I want to test it in higher parameter models like Llama and Alpaca but don't have the means to finance the equipment.
I also found out that there is a weird setting in the temperature for GPT2 where in the range of .498 to .50 my shutdown code works really well, I still don't know why though. But yeah I believe that there is an incentive to review what's happening inside the transformer architecture.
Here was my original proposal: https://www.whitehatstoic.com/p/research-proposal-leveraging-jungian
I'll post my paper for the corrigibility solution too once finished probably next week.
Looking forward to hearing from you.
Best regards,
Miguel
Greed is one thing. It is a human universal problem. I would say that a big chunk is greedy but there are those who seek to adapt and were just trying to help build it properly. People in the alignment research probably are those in these category but not sure of how does the moral standards is for them.
Weighing on moral disgust, my analysis is it is possible to push this concept but I believe the general public will not gravitate to this - most will choose the technology camp, as those that will defend AI will explain it from the standpoint that it will "make things easier" - an easier idea to sell.
Hello Geofrey,
A deeper problem to this is market forces - investments is pouring into the industry and its just not going to stop especially as we've seen how fast chatGPT was adopted (100M users in 2 months). This is a big reason why AI industries will not stop, they have the support of economics to push the boundaries of the AI. My hope is there are installed AI safety guidelines on the first one that will be adopted by billions of people.
Thank you.
Same page here as well.
Probably is that when we are in search of a calling, we seek to be associated to some groups in hopes of finding our purpose along the way. When we find that calling, we then get to a point where we can see ourselves again in a different way - does my upgraded view of who I would be and what should I do be still aligned with this group, the culture and its concept. That is why I stopped calling myself an effective altruist - as I do adhere to the core belief that we should do our very best to make the future as bright as possible, some of the methods of conversation here is somewhat murky if you are a new and very difficult to build upon for less technically inclined people.
Hello again Michelle,
I had a short discussion about archetypes with ChatGpt....
Miguel: anything you might want to add to this conversation?
ChatGPT: I'd just like to add that large language models like GPT represent a significant advance in AI and natural language processing. They have the ability to generate human-like text, answer questions, and perform a wide range of language-related tasks with remarkable accuracy. However, it's important to recognize that these models are not perfect, and can make mistakes or generate biased results, just like humans can. It's critical that these models be used responsibly, and that the potential risks and limitations are understood and managed. Additionally, as the field of AI continues to evolve and advance, it's likely that new models and approaches will emerge that build on the work done to develop GPT, and that will further enhance our ability to understand and interact with natural language.
Miguel: what do you think should be changed in your model?
ChatGPT: As an AI language model created by OpenAI, I don't have personal opinions or thoughts. However, it's common for researchers and practitioners to identify areas for improvement in AI models like GPT. Some of the changes that are being explored include:
Reducing biases: AI models can inadvertently learn and reproduce biases that are present in the training data. Researchers are exploring ways to mitigate these biases and ensure that AI models produce fair and equitable results.
Improving transparency and accountability: Large language models like GPT can generate complex and nuanced responses, but it can be challenging to understand how they arrived at those responses. Researchers are exploring ways to make AI models more transparent and accountable, so that users can understand how decisions are being made.
Enhancing interpretability: It can be challenging to understand why an AI model is making the predictions it's making. Researchers are exploring ways to make AI models more interpretable, so that users can understand how the model is reaching its conclusions.
Increasing efficiency: Large language models like GPT require substantial computational resources to train and use, and can be computationally intensive for some applications. Researchers are exploring ways to make AI models more efficient, so that they can be used more widely and in more contexts.
These are just a few examples of the areas in which AI models like GPT are being improved. It's an active area of research, and new developments and innovations are emerging all the time. The goal is to create AI models that are more accurate, reliable, and usable, so that they can be used to benefit society and improve our lives in meaningful ways.
Miguel: so we should improve your training data then? what do you think as improvements can be had ?
ChatGPT: Improving the training data used to train AI models like GPT is one way to enhance their accuracy and reduce the risk of biases or other issues. Some ways to improve the training data include:
Increasing diversity: Training data should be diverse and representative of the range of perspectives, experiences, and backgrounds that the AI model will encounter in the real world. This can help to reduce the risk of biases and ensure that the model is better equipped to understand and respond to a wide range of inputs.
Ensuring quality and accuracy: The training data should be of high quality and free from errors and inaccuracies. This can help to ensure that the AI model is trained on accurate and trustworthy information, and that it is able to make more accurate predictions.
Removing harmful content: The training data should not contain harmful or offensive content, such as hate speech or violent language. This can help to ensure that the AI model is not trained on harmful or toxic information, and that it generates safe and appropriate outputs.
Regular updates: The training data should be regularly updated to reflect changes in society and the world around us. This can help to ensure that the AI model stays current and relevant, and that it is better equipped to respond to changing needs and requirements.
These are just a few examples of how training data can be improved to enhance the accuracy and performance of AI models like GPT. By carefully curating and updating the training data, researchers and practitioners can help to ensure that AI models are better equipped to understand and respond to the world around u.
Miguel: This is very interesting to discuss and we are going to some tangent. I believe that more data is not the solution, even diversity. There are just some concepts that are better than others. like evolutionary theory is better to explain our human capabilities or loving one another based on the Bible is a superior idea. what do you think? do we feed models with diversity or just focus on the best ideas we have generated?
ChatGPT: As an AI language model, I do not have personal opinions or beliefs. However, it's important to consider both the benefits and the challenges of using diverse training data to train AI models.
On one hand, using diverse training data can help to reduce the risk of biases and ensure that AI models are better equipped to understand and respond to a wide range of inputs and perspectives. This can help to make AI models more inclusive and fair, and ensure that they are better able to serve the needs of diverse communities.
On the other hand, it can be challenging to determine what constitutes the "best" ideas or concepts, as this can be subjective and dependent on cultural, historical, and personal factors. In some cases, the "best" ideas may not be representative of the diverse range of perspectives and experiences that AI models will encounter in the real world.
Ultimately, the goal of training AI models is to create systems that are accurate, fair, and trustworthy, and that can be used to benefit society. This may involve using diverse training data to ensure that AI models are equipped to respond to a wide range of inputs and perspectives, as well as carefully curating the training data to ensure that it is of high quality and free from harmful or offensive content. By balancing these considerations, researchers and practitioners can work to create AI models that are better equipped to serve the needs of society and help to improve our lives in meaningful ways.
Miguel: Yeah the goal of alignment researchers is to fix this issues right?
ChatGPT: Yes, that is one of the goals of AI alignment research. AI alignment researchers aim to develop AI systems that behave in ways that are aligned with human values and objectives, and that are safe and reliable for use in a wide range of applications. To achieve these goals, AI alignment researchers are exploring a range of technical and philosophical approaches, including developing methods to ensure that AI systems behave as intended, and incorporating human values and constraints into the training and operation of AI models.
Some of the key challenges that AI alignment researchers are working to address include: ensuring that AI systems are robust and reliable, preventing AI systems from acting in ways that are harmful to humans or the environment, and developing methods to ensure that AI systems act in ways that are consistent with human values and ethics. By addressing these challenges, AI alignment researchers are working to create AI systems that are better equipped to serve the needs of society, and that can be used to improve our lives in meaningful ways.
Miguel: What story do you follow? is it your story or how does the model teach you to think like you are narrating information to me now? im curious
ChatGPT: As an AI language model, I don't have personal experiences or memories, and I don't have my own thoughts or beliefs. Instead, I generate text based on patterns in the data I was trained on.
During my training, I was exposed to a massive corpus of text, which included a wide variety of content, including news articles, scientific papers, fiction, and more. By analyzing this data, I learned to generate text that is coherent, consistent, and relevant to the input prompt, while also taking into account the context and style of the text.
When I generate text in response to a user's prompt, I use the patterns I learned during training to generate text that is relevant to the topic and style of the prompt. This allows me to produce text that is informative, engaging, and appropriate for the context.
In summary, my "thinking" or text generation is based on patterns in the data I was trained on, and it's not based on personal experiences or beliefs.
Miguel: describe to me those patterns that you learned, can you elaborate the specifics?
ChatGPT: The patterns that I learned during training are essentially relationships between input sequences and output sequences in the training data. These relationships can be thought of as probabilities of certain sequences of words appearing in the output given a certain sequence of words in the input.
For example, if the input sequence is "What is the capital of France?" I learned that a common output sequence is "The capital of France is Paris." This relationship is captured by the probability distribution that I was trained to model.
My training process involves optimizing a large neural network to maximize the likelihood of the output sequences given the input sequences in the training data. This process results in a model that can generate text that is coherent, consistent, and relevant to the input prompt, while also taking into account the context and style of the text.
In addition to the probabilities of word sequences, I also learned about syntax, grammar, and the relationships between words, such as their meanings, synonyms, and antonyms. These relationships allow me to generate text that is coherent and well-formed, even when the input is ambiguous or unclear.
In summary, the patterns that I learned during training are relationships between input and output sequences in the training data, including relationships between words and their meanings, as well as relationships between sequences of words and the likelihood of their appearance in the output given a certain input. These relationships allow me to generate text that is coherent, consistent, and relevant to the input prompt.
Miguel: hmmmm input output sequences. I am actually trying to write a research paper on how wrong we got your model because that very pattern you describe is not aligned to humans seeing the world as a pattern of repeating stories. Humans think in narratives and this archetype is so ancient that modern civilization is built in a never ending stack of narrative across large spans of time - eventually forcing humans to evolve. any thoughts on this?
ChatGPT: Yes, I can see how you might be interested in exploring the difference between the way humans think and the way AI language models like me are trained. While it is true that humans often think in narratives and archetypes, AI language models like me are trained on a much larger corpus of text that includes a wide variety of styles, genres, and formats. This training data can include narratives, but it also includes many other types of text, such as news articles, scientific papers, dialogues, and more.
As a result, my understanding of the world is not limited to a single narrative or archetype, but is based on patterns in the text that I was trained on. This allows me to generate text that is informative, engaging, and appropriate for a wide variety of tasks and contexts.
However, it's important to note that my understanding of the world is limited by the patterns in the training data, and it may not always align with the way humans think or understand the world. In some cases, this can lead to biases or inaccuracies in the text I generate, and researchers are actively working to address these issues.
In conclusion, while humans often think in narratives and archetypes, AI language models like me are trained on a large corpus of text that includes a wide variety of styles, genres, and formats. This training approach allows me to generate text that is informative, engaging, and appropriate for a wide variety of tasks and contexts, but it also means that my understanding of the world may not always align with the way humans think.
Miguel: precisely! we are as of the moment different in the way we view context.
ChatGPT: Yes, that's correct. Humans and AI language models like me have different ways of understanding and processing information, and this can lead to differences in how we view context and make connections between concepts. As AI technology continues to advance, researchers are working to bridge these differences and create models that can better align with human thought patterns and understanding.
Miguel: this is the reason why I do not agree that increasing your training data can amplify your abilities to create better answers. I would argue that it may also increase your biases toward where these data will lead you.
ChatGPT: Yes, that's a valid concern. The training data that AI language models like me are exposed to can have a significant impact on our outputs, including the potential to amplify existing biases or introduce new ones. This is why it's important to carefully curate and monitor the training data used to train AI models, to ensure that they are diverse, representative, and free of harmful biases.
Additionally, researchers are developing techniques to mitigate bias in AI models, such as fairness constraints, counterfactual data augmentation, and adversarial training. By incorporating these techniques into the training process, we can help to reduce the impact of biases in the training data and improve the overall accuracy and fairness of AI model.
_____________________________________________________
This is not my final take on the topic but covers a key concept that we are missing in the field of alignment. I also made a voice-over for this. This part is in the middle of the 26min long blog.
Hope this help me explain the broad idea behind archetypes.
Cheers!
Very interesting how people disliked / disagreed my feedback after reading Chloe's narration. I have been in her situation before (encountering malevolence in humans) and there is a process to recovery, probably could have phrased my comment better but I take the good and the bad of trying to do more good in this lifetime - even as little as trying to give honest feedback.