In Leaked Audio, Microsoft Cherry-Picked Examples to Make Its AI Seem Functional

In Leaked Audio, Microsoft Cherry-Picked Examples to Make Its AI Seem Functional

science, technology By Jan 13, 2024 No Comments

In Leaked Audio, Microsoft Cherry-Picked Examples to Make Its AI Seem Functional

Recent reports have shed light on Microsoft’s tactics regarding its generative AI, with leaked audio revealing that the tech giant may have cherry-picked examples to present a more positive view of its AI capabilities to potential customers. The details emerged from an internal presentation on an early version of Microsoft’s Security Copilot, an AI tool aimed at aiding cybersecurity professionals.

The Story Behind the Leaked Audio

Business Insider obtained the leaked audio, which disclosed how Microsoft reportedly selected specific instances of the AI‘s output to showcase its functionality, despite encountering challenges with the technology.

The leaked audio features a Microsoft researcher discussing the outcomes of “threat hunter” tests in which the AI analyzed a Windows security log for potential malicious activity. The researcher admitted to “cherry-picking” examples to display favorable results, citing the AI‘s tendency to provide inconsistent and sometimes incorrect answers.

Microsoft’s Security Partner, Lloyd Greenwald, voiced the struggles faced, acknowledging that obtaining accurate responses from the AI was not a simple task, and that the model would often stray and present varying answers despite identical queries.

of=”” security=”” the=””>

Security Copilot operates akin to a chatbot, with users inputting queries and receiving responses in the style of a customer service representative. The AI is predominantly built upon OpenAI’s GPT-4 large language model, which also underpins other generative AI initiatives by Microsoft, such as the Bing Search assistant.

Greenwald disclosed that the early demonstrations were Microsoft’s initial explorations into the capabilities of GPT-4, which saw the AI frequently providing inaccurate and even nonsensical responses. This issue, known as “hallucination,” appeared to be widespread in the technology at the time.

Furthermore, the AI‘s reliance on GPT-4 without specific cybersecurity training data compounded the problem, as it operated solely on its extensive general dataset, lacking grounding with real-world cybersecurity parameters.

of=”” the=””>

The revelation of “cherry-picked” examples raises concerns about the accuracy and reliability of Microsoft’s AI capabilities and how these examples were potentially utilized in presentations to government and other prospective clients. It remains unclear whether Microsoft employed these specific instances in its pitches to clients, or if such candid admissions were part of the company’s discourse on its AI technology.

In response to the leak, a microsoft spokesperson clarified that the technology discussed in the meeting was preliminary work predating Security Copilot, and that it was evaluated on simulated data from public datasets without any customer data involvement. The spokesperson emphasized that the examples used were not from actual customer data.

AI=”” debate<="" h3="" ongoing="" the="" to="">

The leaked audio and subsequent disclosures add to the broader ongoing conversation about the challenges and limitations of generative AI technology. It underscores the complexities associated with ensuring accurate and dependable results from AI models, particularly in specialized domains like cybersecurity.

Microsoft’s experience with Security Copilot serves as a reminder of the intricacies involved in training and fine-tuning AI systems for specific applications, and the potential impact of their unrefined output on critical decision-making processes.

The leaked audio unveiling Microsoft’s selective use of AI examples further emphasizes the need for transparency and rigorous evaluation of AI technologies in real-world scenarios. It draws attention to the ongoing efforts required in addressing the challenges associated with ensuring the reliability and effectiveness of generative AI across different domains.

Source: futurism

No Comments

Leave a comment

Your email address will not be published. Required fields are marked *