After reviewing thousands of benchmarks used in AI development, a Stanford team found that 5% could have serious flaws with far-reaching ramifications.After reviewing thousands of benchmarks used in AI development, a Stanford team found that 5% could have serious flaws with far-reaching ramifications.[#item_full_content]