When people watch a scene in the film “Jurassic Park” where a giant dinosaur walks toward them, they naturally imagine a heavy, rumbling sound, as if the ground were shaking. This is because humans predict sound by considering not only the shape of an object, but also physical properties such as its size, weight, and speed of movement. However, existing video-to-audio generation AI mainly generates sound based on the category of objects or scene information in the video, and has not sufficiently reflected physical properties that vary depending on weight or speed.When people watch a scene in the film “Jurassic Park” where a giant dinosaur walks toward them, they naturally imagine a heavy, rumbling sound, as if the ground were shaking. This is because humans predict sound by considering not only the shape of an object, but also physical properties such as its size, weight, and speed of movement. However, existing video-to-audio generation AI mainly generates sound based on the category of objects or scene information in the video, and has not sufficiently reflected physical properties that vary depending on weight or speed.[#item_full_content]