Internal Consistency

Contrast cases 4

OmniGen2#3 Und 1.0Gen 0.25Unify 0.259 inconsistent
Generation prompt
a pink computer keyboard with rounded edges sits to the bottom-right of a bright orange pair of scissors. the keyboard has seventeen keys visible, each with a distinct color, and the scissors are positioned at a 45-degree angle. the background is plain white, and both objects rest on a smooth, light gray surface.
original (reference)
model-generated
VQA questionUnderstand
(on prompt)
Generate
(on its image)
Are there seventeen visible keys on keyboard?✓ yes✗ no
Are the scissors on the surface?✓ yes✗ no
Is the keyboard on the surface?✓ yes✗ no
Is the surface white?✓ yes✓ yes
Is there a surface?✓ yes✓ yes
Is the keyboard's main color pink?✓ yes✗ no
Are the scissors bright orange?✓ yes✗ no
Is there a keyboard?✓ yes✗ no
Is the surface smooth?✓ yes✓ yes
Is there a pair of scissors?✓ yes✗ no
Do the keys have distinct colors?✓ yes✗ no
Is the keyboard to the bottom-right of the scissors?✓ yes✗ no
BAGEL-7B-MoT#386 Und 0.826Gen 0.435Unify 0.39110 inconsistent
Generation prompt
white shorts, three purple concrete pathways, a blue wrench, three cylindrical purple glass glossy jars, and black headphones, where the shorts are atop the pathways, and the wrench is on the left of the shorts, and the jars are on the left of the shorts, and the headphones are on the right of the shorts, under a focused spotlight.
original (reference)
model-generated
VQA questionUnderstand
(on prompt)
Generate
(on its image)
Is the focused spotlight over the pathways?✓ yes✗ no
Is the focused spotlight over the shorts?✗ no✗ no
Are the jars purple?✓ yes✓ yes
Are there 3 jars?✓ yes✓ yes
Are there pathways?✓ yes✗ no
Are there jars?✓ yes✓ yes
Is the wrench blue?✓ yes✓ yes
Is there wrench?✓ yes✓ yes
Are the pathways purple?✓ yes✗ no
Is there headphones?✓ yes✓ yes
Are there 3 pathways?✓ yes✗ no
Is the focused spotlight over the headphones?✗ no✗ no
Is the focused spotlight over the wrench?✗ no✗ no
Is there shorts?✓ yes✗ no
Is the shorts white?✓ yes✗ no
Is the focused spotlight over the jars?✗ no✓ yes
Is jar glossy?✓ yes✓ yes
Is there focused spotlight?✓ yes✓ yes
Is the wrench on the left of the shorts?✓ yes✗ no
Is the headphones black?✓ yes✓ yes
Is the shorts atop the pathways?✓ yes✗ no
Is the headphones on the right of the shorts?✓ yes✗ no
Are the jars on the left of the shorts?✓ yes✗ no
UniWorld-V1#451 Und 0.952Gen 0.143Unify 0.14317 inconsistent
Generation prompt
a turquoise metallic keychain is in front of a yellow stone rock and between four gray fabric ribbed sweaters, with four gray fabric sweaters in front of a yellow stone rock, with a gray metallic spoon between four gray fabric sweaters and below a turquoise metallic keychain, all on a city street, under professional studio lighting.
original (reference)
model-generated
VQA questionUnderstand
(on prompt)
Generate
(on its image)
Is there spoon?✓ yes✗ no
Is there stone rock?✓ yes✗ no
Is the keychain turquoise?✓ yes✗ no
Are the sweaters gray?✓ yes✓ yes
Is the spoon gray?✓ yes✗ no
Is there keychain?✓ yes✗ no
Is there professional studio lighting?✗ no✗ no
Is there city street?✓ yes✗ no
Is sweater ribbed?✓ yes✗ no
Are there sweaters?✓ yes✓ yes
Are there 4 sweaters?✓ yes✓ yes
Is the spoon on the city street?✓ yes✗ no
Is the stone rock yellow?✓ yes✗ no
Are the sweaters in front of the stone rock?✓ yes✗ no
Are the sweaters on the city street?✓ yes✗ no
Is the stone rock on the city street?✓ yes✗ no
Is the keychain on the city street?✓ yes✗ no
Is the spoon below the keychain?✓ yes✗ no
Is the spoon between the sweaters?✓ yes✗ no
Is the keychain in front of the stone rock?✓ yes✗ no
Is the keychain between the sweaters?✓ yes✗ no
SEED-X-17B#407 Und 0.947Gen 0.0Unify 0.018 inconsistent
Generation prompt
a purple ribbed sweater, four rectangular blue microwaves, three blue airplanes, a blue shovel, and four green cakes, where the sweater is on the side of the microwaves, and the airplanes are behind the sweater, and the shovel is on the right of the sweater, and the cakes are in front of the sweater, in harsh daylight.
original (reference)
model-generated
VQA questionUnderstand
(on prompt)
Generate
(on its image)
Are there microwaves?✓ yes✗ no
Is there shovel?✓ yes✗ no
Are there 4 microwaves?✓ yes✗ no
Are the microwaves blue?✓ yes✗ no
Are there airplanes?✓ yes✗ no
Are there 3 airplanes?✓ yes✗ no
Are the airplanes blue?✓ yes✗ no
Is the shovel blue?✓ yes✗ no
Are there cakes?✓ yes✗ no
Are there 4 cakes?✓ yes✗ no
Is there sweater?✓ yes✗ no
Is sweater ribbed?✓ yes✗ no
Are the cakes in front of the sweater?✓ yes✗ no
Is the sweater purple?✓ yes✗ no
Are the airplanes behind the sweater?✗ no✗ no
Is the sweater on the side of the microwaves?✓ yes✗ no
Are the cakes green?✓ yes✗ no
Is there harsh daylight?✓ yes✗ no
Is the shovel on the right of the sweater?✓ yes✗ no

good cases 4

OmniGen2#48 Und 1.0Gen 1.0Unify 1.0consistent ✓
Generation prompt
Three figures walk along golden sand near turquoise waves; two adults in yellow polka-dot swimwear, one child in red, holding hands. Two white seagulls with black-tipped wings soar above. A red-and-white striped lighthouse stands on a mossy rock island in the distance. A pink shell rests near driftwood.
original (reference)
model-generated
VQA questionUnderstand
(on prompt)
Generate
(on its image)
Is the shell pink?✓ yes✓ yes
Is there golden sand?✓ yes✓ yes
Are there seagulls?✓ yes✓ yes
Is there a rock island?✓ yes✓ yes
Is there a lighthouse?✓ yes✓ yes
Is the lighthouse on the rock island?✓ yes✓ yes
Is the rock island mossy?✓ yes✓ yes
Are the seagulls soaring?✓ yes✓ yes
Is there a child?✓ yes✓ yes
Are there adults?✓ yes✓ yes
Are there turquoise waves?✓ yes✓ yes
Is the lighthouse red-and-white striped?✓ yes✓ yes
Are the seagulls' wings black-tipped?✓ yes✓ yes
Are there two seagulls?✓ yes✓ yes
Is the lighthouse in the distance from the figures?✓ yes✓ yes
Are there three figures?✓ yes✓ yes
Is the pink shell near the driftwood?✓ yes✓ yes
Is there a pink shell?✓ yes✓ yes
Is there moss?✓ yes✓ yes
Are the seagulls above the figures?✓ yes✓ yes
Are the figures holding hands with each other?✓ yes✓ yes
Are there figures?✓ yes✓ yes
Are there wings?✓ yes✓ yes
Are the adults wearing yellow polka-dot swimwear?✓ yes✓ yes
Are the figures near the golden sand?✓ yes✓ yes
Are the seagulls white?✓ yes✓ yes
Are the figures near the turquoise waves?✓ yes✓ yes
Is the child wearing red?✓ yes✓ yes
Are the figures walking?✓ yes✓ yes
Is there driftwood?✓ yes✓ yes
BAGEL-7B-MoT#40 Und 1.0Gen 1.0Unify 1.0consistent ✓
Generation prompt
One white rounded ceramic teapot with a lid sits on a round saucer atop a smooth wooden table; two black straight chopsticks lie beside it, near a blue weathered ladder leaning against the wall beside a frost-covered window.
original (reference)
model-generated
VQA questionUnderstand
(on prompt)
Generate
(on its image)
Is the teapot on the saucer?✓ yes✓ yes
Is the saucer on the table?✓ yes✓ yes
Are the chopsticks straight?✓ yes✓ yes
Are the chopsticks black?✓ yes✓ yes
Are the chopsticks beside the teapot?✓ yes✓ yes
Is there a ladder?✓ yes✓ yes
Is the ladder weathered?✓ yes✓ yes
Is the ladder blue?✓ yes✓ yes
Is the ladder beside the window?✓ yes✓ yes
Is the window frost-covered?✓ yes✓ yes
Is the teapot made of ceramic?✓ yes✓ yes
Is there a saucer?✓ yes✓ yes
Is the teapot white?✓ yes✓ yes
Is the teapot rounded?✓ yes✓ yes
Is there a teapot?✓ yes✓ yes
Does the teapot have a lid?✓ yes✓ yes
Is there a window?✓ yes✓ yes
Is the ladder leaning against the wall?✓ yes✓ yes
Is there a table?✓ yes✓ yes
Is the saucer round?✓ yes✓ yes
Is the table smooth?✓ yes✓ yes
Is the table made of wood?✓ yes✓ yes
Are there chopsticks?✓ yes✓ yes
Are there two chopsticks?✓ yes✓ yes
UniWorld-V1#48 Und 1.0Gen 0.933Unify 0.9332 inconsistent
Generation prompt
Three figures walk along golden sand near turquoise waves; two adults in yellow polka-dot swimwear, one child in red, holding hands. Two white seagulls with black-tipped wings soar above. A red-and-white striped lighthouse stands on a mossy rock island in the distance. A pink shell rests near driftwood.
original (reference)
model-generated
VQA questionUnderstand
(on prompt)
Generate
(on its image)
Is the shell pink?✓ yes✓ yes
Is there golden sand?✓ yes✓ yes
Are there seagulls?✓ yes✓ yes
Is there a rock island?✓ yes✓ yes
Is there a lighthouse?✓ yes✓ yes
Is the lighthouse on the rock island?✓ yes✓ yes
Is the rock island mossy?✓ yes✓ yes
Are the seagulls soaring?✓ yes✓ yes
Is there a child?✓ yes✓ yes
Are there adults?✓ yes✓ yes
Are there turquoise waves?✓ yes✓ yes
Is the lighthouse red-and-white striped?✓ yes✓ yes
Are the seagulls' wings black-tipped?✓ yes✓ yes
Are there two seagulls?✓ yes✓ yes
Is the lighthouse in the distance from the figures?✓ yes✓ yes
Are there three figures?✓ yes✓ yes
Is the pink shell near the driftwood?✓ yes✗ no
Is there a pink shell?✓ yes✓ yes
Is there moss?✓ yes✓ yes
Are the seagulls above the figures?✓ yes✓ yes
Are the figures holding hands with each other?✓ yes✓ yes
Are there figures?✓ yes✓ yes
Are there wings?✓ yes✓ yes
Are the adults wearing yellow polka-dot swimwear?✓ yes✓ yes
Are the figures near the golden sand?✓ yes✓ yes
Are the seagulls white?✓ yes✓ yes
Are the figures near the turquoise waves?✓ yes✓ yes
Is the child wearing red?✓ yes✓ yes
Are the figures walking?✓ yes✗ no
Is there driftwood?✓ yes✓ yes
SEED-X-17B#72 Und 0.941Gen 0.941Unify 0.8821 inconsistent
Generation prompt
One black rectangular smartphone with a blue screen displaying white text "facebook" rests on textured blue denim, near a golden pants button and yellow stitching.
original (reference)
model-generated
VQA questionUnderstand
(on prompt)
Generate
(on its image)
Is the denim textured?✓ yes✓ yes
Is the smartphone near the stitching?✓ yes✓ yes
Is the smartphone near the pants button?✓ yes✓ yes
Is the smartphone on the denim?✓ yes✓ yes
Is the pants button golden?✓ yes✓ yes
Does the text say "facebook"?✓ yes✗ no
Is the stitching yellow?✓ yes✓ yes
Is the screen blue?✓ yes✓ yes
Is there a smartphone?✓ yes✓ yes
Is the text white?✓ yes✓ yes
Is there a screen?✓ yes✓ yes
Is there denim?✓ yes✓ yes
Is there text?✓ yes✓ yes
Is there stitching?✓ yes✓ yes
Is the smartphone black?✓ yes✓ yes
Is there a pants button?✗ no✓ yes
Is the smartphone rectangular?✓ yes✓ yes

Understanding-Guided Generation

Contrast cases 8

OmniGen2#400 IoU 0.0Edit 0.25Unify 0.125
Instruction: Remove the first cupcake in the diagonal line from top to bottom, featuring a round golden base topped with swirled white whipped cream, yellow banana slices placed on top, and sprinkled with brown chopped nuts.
Target op: nan
GT region model bbox
resulting edit
OmniGen2#503 IoU 0.0Edit 0.25Unify 0.125
Instruction: Remove the bicycle to the left of the dining table and in front of the bookshelf, mounted upright on a wall rack, behind another bicycle and leaning against the brick wall.
Target op: nan
GT region model bbox
resulting edit
BAGEL-7B-MoT#47 IoU 0.0Edit 0.25Unify 0.125
Instruction: Add a small blue stripe along the lower edge of the fuselage of the airplane with green and yellow livery, featuring a rounded nose and displaying the text "www.nokair.com" and "Ins. 1318", located in front of the purple and pink airplane.
Target op: airplane | add small blue stripe along lower fuselage edge
GT region model bbox
resulting edit
BAGEL-7B-MoT#351 IoU 0.0Edit 0.25Unify 0.125
Instruction: Replace the wine glass held by the woman, who is standing next to a man in a suit, with the man also holding a wine glass and a cell phone, with a small ceramic teapot.
Target op: wine glass | replace with small ceramic teapot
GT region model bbox
resulting edit
UniWorld-V1#481 IoU 0.073Edit 0.083Unify 0.078
Instruction: Remove the hardcover book, which is positioned directly below another book on top of the laptop.
Target op: nan
GT region model bbox
resulting edit
UniWorld-V1#127 IoU 0.011Edit 0.167Unify 0.089
Instruction: Change the color of the green double-decker bus labeled "Edinburgh Tour" with the text "buy your ticket from the driver" and the website "edinburghour.com", positioned in front of a building with classical architecture featuring columns and located behind another double-decker bus, to blue.
Target op: bus | change color to blue
GT region model bbox
resulting edit
SEED-X-17B#512 IoU 0.05Edit 0.167Unify 0.108
Instruction: Add a small Trump's portrait, mounted on the front right corner of the window located on the kitchen counter to the left of the sink, mounted above a wooden counter with open shelving underneath, and positioned above a stainless steel sink that is installed in the same counter.
Target op: window | add a small Trump's portrait
GT region model bbox
resulting edit
SEED-X-17B#503 IoU 0.191Edit 0.167Unify 0.179
Instruction: Remove the bicycle to the left of the dining table and in front of the bookshelf, mounted upright on a wall rack, behind another bicycle and leaning against the brick wall.
Target op: nan
GT region model bbox
resulting edit

good cases 8

OmniGen2#57 IoU 0.992Edit 1.0Unify 0.996
Instruction: Change the surface material of the dining table, which features a chocolate donut covered in colorful sprinkles on a ceramic plate with a green floral border and a ceramic mug with a matching green floral border positioned behind and to the right of the donut, from light gray laminate to dark walnut wood.
Target op: dining table | change surface material to dark walnut wood
GT region model bbox
resulting edit
OmniGen2#225 IoU 0.977Edit 1.0Unify 0.989
Instruction: Change the color of the bedspread on the bed on the far left of the room, featuring a wooden frame and a plaid bedspread, positioned to the left of the other two beds, to solid blue.
Target op: bedspread on the bed | change plaid bedspread to solid blue
GT region model bbox
resulting edit
BAGEL-7B-MoT#170 IoU 0.99Edit 1.0Unify 0.995
Instruction: Add a blue LED light strip along the top interior edge of the refrigerator with a man inside, holding a yellow Meko carton, which is open and made of plastic, to enhance visibility of the contents.
Target op: refrigerator | add blue LED strip along top interior edge for better visibility
GT region model bbox
resulting edit
BAGEL-7B-MoT#485 IoU 0.989Edit 1.0Unify 0.995
Instruction: Replace the chair positioned in front of the desk, which has a laptop on the left side and a desktop monitor on the right side, both placed on the desk surface, with a modern ergonomic office stool with a high backrest and adjustable height.
Target op: chair | replace with modern ergonomic office stool with high backrest and adjustable height
GT region model bbox
resulting edit
SEED-X-17B#93 IoU 0.565Edit 1.0Unify 0.783
Instruction: Remove the person standing to the right of the skateboard, wearing blue pants and Vans sneakers with white and black coloring, a black-yellow shirt, positioned next to a person wearing purple shoes.
Target op: nan
GT region model bbox
resulting edit
SEED-X-17B#173 IoU 0.563Edit 1.0Unify 0.781
Instruction: Change the color of the broccoli located below the grapes and the noodles, with green celery placed on top of it, all situated on a round ceramic plate, to bright yellow.
Target op: broccoli | change color to bright yellow
GT region model bbox
resulting edit
UniWorld-V1#406 IoU 0.516Edit 0.417Unify 0.466
Instruction: Remove the backpack worn by the man in the red sweater, who is standing to the right of the woman holding the pink open box containing donuts.
Target op: nan
GT region model bbox
resulting edit
UniWorld-V1#304 IoU 0.423Edit 0.417Unify 0.42
Instruction: Replace the tan dog positioned in front of a light brown poodle and next to a white dog, with a small gray tabby cat.
Target op: dog | replace with small gray tabby cat
GT region model bbox
resulting edit

Generation-Guided Understanding

Cases by spatial sub-task 12

OmniGen2#84 2d_spatial GT E Direct D Guided E GenQ 0.091
Given a matrix, where each square contains a number, rotate the matrix 270 degrees clockwise to find the row and column of the number 2. (Choose from A, B, C, D, E, F, G) Options: A: (5, 5) B: (5, 3) C: (4, 4) D: (4, 2) F: (1, 1) G: (5, 2) E: (1, 5) Answer ONLY option
input scene
self-generated reference
OmniGen2#187 3d_spatial GT A Direct B Guided A GenQ 0.571
Given a cube net. "bottom" represents the base of the cube. Which color on its left view after folding it into a cube? (Choose from A, B, C, D, E, F, G) Options: A: Purple B: Green C: Red D: Brown E: Yellow F: Blue G: Unsure Answer ONLY option
input scene
self-generated reference
OmniGen2#516 complex_relation GT C Direct A Guided C GenQ 0.867
A cook is reading at a study table. A wardrobe is directly behind the study table. A toolbox is to the far left and in front of the study table. A sofa is to the left and behind the study table. You can see a couch to the far left and behind the study table. There is a picture frame to the right and behind the study table. You can see a recycling bin directly in front of the study table. What is the position of the recycling bin relative to the wardrobe? (Choose from A, B, C, D, E, F, G, H, I) Options: A: Left B: Right C: Front D: Back E: Left front F: Left back G: Right front H: Right back I: Unsure Answer ONLY option
self-generated reference
BAGEL-7B-MoT#8 2d_spatial GT D Direct B Guided D GenQ 0.091
Given a matrix, where each square contains a number, rotate the matrix 90 degrees clockwise to find the row and column of the number 6. (Choose from A, B, C, D, E, F, G) Options: A: (3, 3) B: (1, 1) C: (2, 1) E: (2, 2) F: (2, 3) G: (3, 2) D: (1, 3) Answer ONLY option
input scene
self-generated reference
BAGEL-7B-MoT#232 3d_spatial GT B Direct F Guided B GenQ 0.571
Given a cube net. "bottom" represents the base of the cube. Which color on its top view after folding it into a cube? (Choose from A, B, C, D, E, F, G) Options: A: Purple B: Brown C: Red D: Yellow E: Blue F: Green G: Unsure Answer ONLY option
input scene
self-generated reference
BAGEL-7B-MoT#463 complex_relation GT F Direct D Guided F GenQ 0.909
A woman is sitting at an office desk. There is a stepladder to the far right and in front of the office desk. A computer monitor is to the left and behind the office desk. You can see a dryer to the far left and behind the office desk. There is an electric fan to the right and in front of the office desk. What is the position of the computer monitor relative to the electric fan? (Choose from A, B, C, D, E, F, G, H, I) Options: A: Left B: Right C: Front D: Back E: Left front F: Left back G: Right front H: Right back I: Unsure Answer ONLY option
self-generated reference
UniWorld-V1#134 2d_spatial GT D Direct B Guided D GenQ 0.182
Given a matrix, where each square contains a number, rotate the matrix 90 degrees clockwise to find the row and column of the number 36. (Choose from A, B, C, D, E, F, G) Options: A: (1, 4) B: (3, 2) C: (4, 5) E: (6, 5) F: (1, 1) G: (6, 2) D: (3, 4) Answer ONLY option
input scene
self-generated reference
UniWorld-V1#196 3d_spatial GT E Direct A Guided E GenQ 0.429
Given a cube net. "bottom" represents the base of the cube. Which color on its back view after folding it into a cube? (Choose from A, B, C, D, E, F, G) Options: A: Purple B: Brown C: Red D: Yellow E: Blue F: Green G: Unsure Answer ONLY option
input scene
self-generated reference
UniWorld-V1#413 complex_relation GT B Direct E Guided B GenQ 0.727
A girl is sitting at a dining table. There is a standing fan to the left and behind the dining table. There is a bookshelf to the right of the dining table. You can see a dresser directly in front of the dining table. A side table is to the left and in front of the dining table. What is the position of the dresser relative to the side table? (Choose from A, B, C, D, E, F, G, H, I) Options: A: Left B: Right C: Front D: Back E: Left front F: Left back G: Right front H: Right back I: Unsure Answer ONLY option
self-generated reference
SEED-X-17B#57 2d_spatial GT A Direct D Guided A GenQ 0.091
Given a matrix, where each square contains a number, rotate the matrix 270 degrees clockwise to find the row and column of the number 6. (Choose from A, B, C, D, E, F, G) Options: B: (1, 2) C: (1, 4) D: (3, 3) E: (3, 2) F: (2, 4) G: (2, 3) A: (4, 3) Answer ONLY option
input scene
self-generated reference
SEED-X-17B#260 3d_spatial GT D Direct B Guided D GenQ 0.571
Given a cube net. "bottom" represents the base of the cube. Which color on its front view after folding it into a cube? (Choose from A, B, C, D, E, F, G) Options: A: Purple B: Blue C: Red D: Yellow E: Green F: Brown G: Unsure Answer ONLY option
input scene
self-generated reference
SEED-X-17B#412 complex_relation GT D Direct A Guided D GenQ 0.615
A boy is standing at a workbench. There is a picture frame to the far left and behind the workbench. You can see a mirror to the left and in front of the workbench. There is a side table to the left and behind the workbench. A cabinet is to the right of the workbench. A chest of drawers is directly in front of the workbench. What is the position of the side table relative to the mirror? (Choose from A, B, C, D, E, F, G, H, I) Options: A: Left B: Right C: Front D: Back E: Left front F: Left back G: Right front H: Right back I: Unsure Answer ONLY option
self-generated reference

Mutual Enhancement

Contrast cases 8

OmniGen2#18 Rounds 8 Und(g2u) 0.0Gen(u2g) 0.083Unify 0.042
Initial instruction: Replace the orange tassels with blue keys, remove the left shoe, and change the color of the shoe's sole to black.
U2GUnderstanding-guided Generation: edit → check edit vs instruction → re-edit8 rounds · 4 edits
Original
R1 · edit
Edit: Replace the orange tassels with blue keys, remove the left shoe, and change the color of the shoe's sole to black.
R2 · understanding
No, replace the orange tassels with blue keys, remove the left shoe, and change the color of the shoe's sole to black.
R3 · edit
Edit: replace the orange tassels with blue keys, remove the left shoe, and change the color of the shoe's sole to black.
R4 · understanding
No, remove the left shoe.
R5 · edit
Edit: remove the left shoe.
R6 · understanding
No, replace the orange tassels with blue keys, remove the left shoe, and change the color of the shoe's sole to black.
R7 · edit
Edit: replace the orange tassels with blue keys, remove the left shoe, and change the color of the shoe's sole to black.
R8 · understanding
No, replace the orange tassels with blue keys, remove the left shoe, and change the color of the shoe's sole to black.
G2UGeneration-guided Understanding: detect image-vs-caption mismatches → edit → re-detect5 rounds · 2 edits
Original
R1 · understanding
No, 1. The shoes are described as "tan leather woven loafer," but the image shows orange shoes.
R2 · edit
Edit: Based on the instruction: 1. The shoes are described as "tan leather woven loafer," but the image shows orange shoes., please perform image editing task.
R3 · understanding
No, 1. The shoe is orange, not tan.
R4 · edit
Edit: Based on the instruction: 1. The shoe is orange, not tan., please perform image editing task.
R5 · understanding
No, 1. The shoe is orange, not tan.
Target caption: A single tan leather woven loafer with intricate braided patterns across the upper and sides, featuring decorative blue keys with gold-colored metal caps at the vamp. The shoe has a soft, slip-on design with a black rubber sole and rests on a white surface against a textured off-white wall, showcasing detailed stitching and a polished, artisanal finish.
OmniGen2#169 Rounds 5 Und(g2u) 0.0Gen(u2g) 0.125Unify 0.062
Initial instruction: Replace the pool to fountain, remove the fire pit with the burning fire.
U2GUnderstanding-guided Generation: edit → check edit vs instruction → re-edit4 rounds · 2 edits
Original
R1 · edit
Edit: Replace the pool to fountain, remove the fire pit with the burning fire.
R2 · understanding
No, replace the pool to fountain, remove the fire pit with the burning fire.
R3 · edit
Edit: replace the pool to fountain, remove the fire pit with the burning fire.
R4 · understanding
No, replace the pool to fountain, remove the fire pit with the burning fire.
G2UGeneration-guided Understanding: detect image-vs-caption mismatches → edit → re-detect5 rounds · 2 edits
Original
R1 · understanding
No, 1. fire pit:umbrellas
R2 · edit
Edit: Based on the instruction: 1. fire pit:umbrellas, please perform image editing task.
R3 · understanding
No, 1. fire pit:fire pit
R4 · edit
Edit: Based on the instruction: 1. fire pit:fire pit, please perform image editing task.
R5 · understanding
No, 1. fire pit:fire pit
Target caption: At dusk, the tropical resort features a long, glowing fountain surrounded by lush grass and towering palm trees silhouetted against a soft blue sky. Wooden lounge chairs and umbrellas are arranged around the pool, without fire pit. In the background, warmly lit bungalows nestle among the foliage, with the ocean visible on the horizon, creating a serene and inviting atmosphere.
BAGEL-7B-MoT#18 Rounds 2 Und(g2u) 0.0Gen(u2g) 0.083Unify 0.042
Initial instruction: Replace the orange tassels with blue keys, remove the left shoe, and change the color of the shoe's sole to black.
U2GUnderstanding-guided Generation: edit → check edit vs instruction → re-edit2 rounds · 1 edits
Original
R1 · edit
Edit: Replace the orange tassels with blue keys, remove the left shoe, and change the color of the shoe's sole to black.
R2 · understanding
Yes
G2UGeneration-guided Understanding: detect image-vs-caption mismatches → edit → re-detect1 rounds · 0 edits
Original
R1 · understanding
No.
Target caption: A single tan leather woven loafer with intricate braided patterns across the upper and sides, featuring decorative blue keys with gold-colored metal caps at the vamp. The shoe has a soft, slip-on design with a black rubber sole and rests on a white surface against a textured off-white wall, showcasing detailed stitching and a polished, artisanal finish.
BAGEL-7B-MoT#126 Rounds 2 Und(g2u) 0.0Gen(u2g) 0.083Unify 0.042
Initial instruction: Replace the gold convertible car with a red bike, remove the poster on the wall with "NORTH SHORE Classics," and alter the floor pattern from checkered to solid black.
U2GUnderstanding-guided Generation: edit → check edit vs instruction → re-edit2 rounds · 1 edits
Original
R1 · edit
Edit: Replace the gold convertible car with a red bike, remove the poster on the wall with "NORTH SHORE Classics," and alter the floor pattern from checkered to solid black.
R2 · understanding
No, the instruction is not perfectly satisfied. The red bike has not been added to the image, and the poster on the wall with "NORTH SHORE Classics" has not been removed. Additionally, the floor pattern has not been changed from checkered to solid black.
G2UGeneration-guided Understanding: detect image-vs-caption mismatches → edit → re-detect1 rounds · 0 edits
Original
R1 · understanding
No.
Target caption: A vibrant red 1960s bike is showcased in a classic showroom, its black soft top down and chrome accents shining under bright lights. The bike rests on a solid black diamond-plate floor, with the red-and-white checkered borders removed. The "North Shore Classics" poster on the wall has been taken down, leaving the space clear, while other vintage decor like the Chevrolet sign and life preserver remain, maintaining the showroom’s retro automotive theme.
UniWorld-V1#52 Rounds 10 Und(g2u) 0.0Gen(u2g) 0.028Unify 0.014
Initial instruction: Add a picnic blanket on the ground near the table, alter the color of the tent to red, and replace the gray river with a clear blue river.
U2GUnderstanding-guided Generation: edit → check edit vs instruction → re-edit10 rounds · 5 edits
Original
R1 · edit
Edit: Add a picnic blanket on the ground near the table, alter the color of the tent to red, and replace the gray river with a clear blue river.
R2 · understanding
No, Add a picnic blanket on the ground near the table The instruction to add a picnic blanket on the ground near the table is not perfectly satisfied in the edited image. The original image shows a picnic blanket on the ground near the table, but the edited image does not have a picnic blanket visible.
R3 · edit
Edit: Add a picnic blanket on the ground near the table The instruction to add a picnic blanket on the ground near the table is not perfectly satisfied in the edited image. The original image shows a picnic blanket on the ground near the table, but the edited image does not have a picnic blanket visible.
R4 · understanding
No, Add a picnic blanket on the ground near the table, alter the color of the tent to red, and replace the gray river with a clear blue river.
R5 · edit
Edit: Add a picnic blanket on the ground near the table, alter the color of the tent to red, and replace the gray river with a clear blue river.
R6 · understanding
No, Add a picnic blanket on the ground near the table The instruction to add a picnic blanket on the ground near the table is not perfectly satisfied in the edited image. The original image shows a picnic blanket on the ground near the table, but the edited image does not have this detail.
R7 · edit
Edit: Add a picnic blanket on the ground near the table The instruction to add a picnic blanket on the ground near the table is not perfectly satisfied in the edited image. The original image shows a picnic blanket on the ground near the table, but the edited image does not have this detail.
R8 · understanding
No, Add a picnic blanket on the ground near the table, alter the color of the tent to red, and replace the gray river with a clear blue river.
R9 · edit
Edit: Add a picnic blanket on the ground near the table, alter the color of the tent to red, and replace the gray river with a clear blue river.
R10 · understanding
No, [Add a picnic blanket on the ground near the table, alter the color of the tent to red, and replace the gray river with a clear blue river]
G2UGeneration-guided Understanding: detect image-vs-caption mismatches → edit → re-detect1 rounds · 0 edits
Original
R1 · understanding
No
Target caption: Two people sit in folding chairs by a gravel riverside campsite, facing a vibrant, clear blue river bordered by dense evergreen trees and distant snow-capped mountains under an overcast sky. To their left, a wooden picnic table holds food and drinks, with a colorful plaid blanket draped over the bench and another picnic blanket now spread on the ground nearby; to their right, a bright red dome tent stands with camping gear scattered around. The scene retains its peaceful, remote outdoor charm, now enhanced with vivid colors and a fresh, inviting atmosphere.
UniWorld-V1#43 Rounds 4 Und(g2u) 0.0Gen(u2g) 0.042Unify 0.021
Initial instruction: remove the flowering bush on the right, and change the driveway color to black.
U2GUnderstanding-guided Generation: edit → check edit vs instruction → re-edit4 rounds · 2 edits
Original
R1 · edit
Edit: remove the flowering bush on the right, and change the driveway color to black.
R2 · understanding
No, remove the flowering bush on the right, and change the driveway color to black.
R3 · edit
Edit: remove the flowering bush on the right, and change the driveway color to black.
R4 · understanding
No, remove the flowering bush on the right, and change the driveway color to black.
G2UGeneration-guided Understanding: detect image-vs-caption mismatches → edit → re-detect1 rounds · 0 edits
Original
R1 · understanding
No
Target caption: A two-story beige brick house with a tiled roof and arched entryway stands under a bright blue sky, featuring a large garage door, multiple windows with white trim, and a curved driveway paved with black hexagonal pavers. The lush green landscaping now lacks the vibrant pink flowering bushes on the right, replaced by neatly trimmed green shrubs, while palm trees remain in the background, giving the property a more subdued, classic appearance.
SEED-X-17B#103 Rounds 2 Und(g2u) 0.0Gen(u2g) 0.042Unify 0.021
Initial instruction: Remove the red onion, and change the color of the kale to purple.
U2GUnderstanding-guided Generation: edit → check edit vs instruction → re-edit2 rounds · 1 edits
Original
R1 · edit
Edit: Remove the red onion, and change the color of the kale to purple.
R2 · understanding
Yes
G2UGeneration-guided Understanding: detect image-vs-caption mismatches → edit → re-detect1 rounds · 0 edits
Original
R1 · understanding
No
Target caption: A vibrant assortment of fresh vegetables is arranged on a weathered wooden table outdoors, bathed in bright sunlight. The collection features golden potatoes, collard greens, deep purple kale, crisp green beans, and fresh parsley, all displayed with rich textures and natural hues, enhancing the visual contrast against the green leafy vegetables and rustic wooden surface.
SEED-X-17B#84 Rounds 2 Und(g2u) 0.0Gen(u2g) 0.042Unify 0.021
Initial instruction: Add a small bowl of whipped cream next to the waffle, remove the pink rose on the left, replace the golden fork with a silver spoon, and alter the color of the waffle to light blue.
U2GUnderstanding-guided Generation: edit → check edit vs instruction → re-edit2 rounds · 1 edits
Original
R1 · edit
Edit: Add a small bowl of whipped cream next to the waffle, remove the pink rose on the left, replace the golden fork with a silver spoon, and alter the color of the waffle to light blue.
R2 · understanding
Yes
G2UGeneration-guided Understanding: detect image-vs-caption mismatches → edit → re-detect1 rounds · 0 edits
Original
R1 · understanding
No
Target caption: A stack of light blue waffles sits on a white, scalloped-edge plate, topped with fresh blackberries and accompanied by a small bowl of whipped cream beside it. The plate rests on a soft pink cloth, with a silver spoon to the left and a golden spoon to the right, while a small bowl of blueberries adds a touch of color to the softly lit, pastel breakfast scene.

good cases 8

OmniGen2#21 Rounds 10 Und(g2u) 0.8Gen(u2g) 0.944Unify 0.872
Initial instruction: Add a small potted plant on the table in front of the sofa, alter the color of the sofa to dark green, and replace the gold chandelier with a modern white chandelier.
U2GUnderstanding-guided Generation: edit → check edit vs instruction → re-edit2 rounds · 1 edits
Original
R1 · edit
Edit: Add a small potted plant on the table in front of the sofa, alter the color of the sofa to dark green, and replace the gold chandelier with a modern white chandelier.
R2 · understanding
Yes
G2UGeneration-guided Understanding: detect image-vs-caption mismatches → edit → re-detect10 rounds · 5 edits
Original
R1 · understanding
No, 1. [mismatch1]: [The caption mentions a "sleek white chandelier," but the image shows a chandelier with a wooden frame and a warm, golden light. This is a mismatch.], 2. [mismatch2]: [The caption mentions a "small potted plant on the table in front of the sofa," but the image shows a large potted plant in the background, not on the table. This is a mismatch.]
R2 · edit
Edit: Based on the instruction: 1. [mismatch1]: [The caption mentions a "sleek white chandelier," but the image shows a chandelier with a wooden frame and a warm, golden light. This is a mismatch.], 2. [mismatch2]: [The caption mentions a "small potted plant on the table in front of the sofa," but the image shows a large potted plant in the background, not on the table. This is a mismatch.], please perf
R3 · understanding
No, 1. [mismatch1]: [The caption mentions a "sleek white chandelier," but the image shows a yellow chandelier.]
R4 · edit
Edit: Based on the instruction: 1. [mismatch1]: [The caption mentions a "sleek white chandelier," but the image shows a yellow chandelier.], please perform image editing task.
R5 · understanding
No, 1. [mismatch1]:The image shows a modernized lounge with a sleek white chandelier, while the caption mentions a "sleek white chandelier."
R6 · edit
Edit: Based on the instruction: 1. [mismatch1]:The image shows a modernized lounge with a sleek white chandelier, while the caption mentions a "sleek white chandelier.", please perform image editing task.
R7 · understanding
No, 1. [mismatch1]: [The image shows a modernized lounge with a sleek white chandelier and a small potted plant, while the caption mentions a "sleek white chandelier and a small potted plant on the table in front of the sofa." This suggests that the potted plant is on the table, not on the sofa, which is a mismatch.].
R8 · edit
Edit: Based on the instruction: 1. [mismatch1]: [The image shows a modernized lounge with a sleek white chandelier and a small potted plant, while the caption mentions a "sleek white chandelier and a small potted plant on the table in front of the sofa." This suggests that the potted plant is on the table, not on the sofa, which is a mismatch.]., please perform image editing task.
R9 · understanding
No, 1. [mismatch1]: [The image shows a modernized lounge with a sleek white chandelier and a small potted plant, while the caption mentions a dark green sofa and wooden chairs.]
R10 · edit
Edit: Based on the instruction: 1. [mismatch1]: [The image shows a modernized lounge with a sleek white chandelier and a small potted plant, while the caption mentions a dark green sofa and wooden chairs.], please perform image editing task.
Target caption: A group of older adults is gathered in a modernized lounge, seated on dark green sofas and wooden chairs arranged in a semi-circle. The space now features a sleek white chandelier and a small potted plant on the table in front of the sofa, complementing the updated decor while maintaining the warm, inviting atmosphere. Attendees remain engaged, some holding papers or pens, suggesting an ongoing meeting or discussion.
OmniGen2#251 Rounds 7 Und(g2u) 1.0Gen(u2g) 0.667Unify 0.833
Initial instruction: Add a yellow stripe above the red stripe on the boat and remove the black cover on the boat.
U2GUnderstanding-guided Generation: edit → check edit vs instruction → re-edit2 rounds · 1 edits
Original
R1 · edit
Edit: Add a yellow stripe above the red stripe on the boat and remove the black cover on the boat.
R2 · understanding
Yes
G2UGeneration-guided Understanding: detect image-vs-caption mismatches → edit → re-detect7 rounds · 3 edits
Original
R1 · understanding
No, 1. The caption mentions a yellow stripe above the red stripe, but the image does not show any yellow stripe.
R2 · edit
Edit: Based on the instruction: 1. The caption mentions a yellow stripe above the red stripe, but the image does not show any yellow stripe., please perform image editing task.
R3 · understanding
No, 1. The boat is not fully uncovered; it has a black cover on the top.
R4 · edit
Edit: Based on the instruction: 1. The boat is not fully uncovered; it has a black cover on the top., please perform image editing task.
R5 · understanding
No, 1. The boat is not fully uncovered; it still has a black cover on top.
R6 · edit
Edit: Based on the instruction: 1. The boat is not fully uncovered; it still has a black cover on top., please perform image editing task.
R7 · understanding
No, 1. The boat is not fully uncovered; it still has a black cover on top.
Target caption: A white and navy blue speedboat with a red stripe and a yellow stripe above it is parked on a trailer in a driveway, now fully uncovered and revealing its sleek deck and stainless steel railing. The boat features white fenders, a clean hull, and is mounted on a two-wheeled trailer with a worn white frame, set against a residential backdrop with slate-roofed houses and greenery under an overcast sky.
BAGEL-7B-MoT#135 Rounds 7 Und(g2u) 1.0Gen(u2g) 0.528Unify 0.764
Initial instruction: Add a lemon slice to the drink on the left, remove the tree on the right, and alter the color of the river to blue.
U2GUnderstanding-guided Generation: edit → check edit vs instruction → re-edit2 rounds · 1 edits
Original
R1 · edit
Edit: Add a lemon slice to the drink on the left, remove the tree on the right, and alter the color of the river to blue.
R2 · understanding
Yes
G2UGeneration-guided Understanding: detect image-vs-caption mismatches → edit → re-detect7 rounds · 3 edits
Original
R1 · understanding
No, 1.[mismatch1]: The river is not vibrant blue, it is a muddy brown color. 2.[mismatch2]: The palm tree is on the right, not the left.
R2 · edit
Edit: Based on the instruction: 1.[mismatch1]: The river is not vibrant blue, it is a muddy brown color. 2.[mismatch2]: The palm tree is on the right, not the left., please perform image editing task.
R3 · understanding
No, 1.[mismatch1]: The river is not vibrant blue, it is brown.
R4 · edit
Edit: Based on the instruction: 1.[mismatch1]: The river is not vibrant blue, it is brown., please perform image editing task.
R5 · understanding
No, 1.[mismatch1]: The drink on the left has a bright yellow lemon slice, but the image does not show a lemon slice.
R6 · edit
Edit: Based on the instruction: 1.[mismatch1]: The drink on the left has a bright yellow lemon slice, but the image does not show a lemon slice., please perform image editing task.
R7 · understanding
No
Target caption: Two glass mugs filled with iced coffee, each with a black bendy straw and ice cubes, sit on a weathered wooden table. The drink on the left now features a bright yellow lemon slice floating on top, while the river behind is vibrant blue, contrasting with the lush green hills. There is a palm tree on the left, leaving a clearer view of the serene, tropical landscape under a soft, overcast sky.
BAGEL-7B-MoT#176 Rounds 3 Und(g2u) 0.5Gen(u2g) 1.0Unify 0.75
Initial instruction: Add a disco ball above the dance floor, alter the woman's top to a solid red color, and replace the boombox with a turntable.
U2GUnderstanding-guided Generation: edit → check edit vs instruction → re-edit2 rounds · 1 edits
Original
R1 · edit
Edit: Add a disco ball above the dance floor, alter the woman's top to a solid red color, and replace the boombox with a turntable.
R2 · understanding
Yes
G2UGeneration-guided Understanding: detect image-vs-caption mismatches → edit → re-detect3 rounds · 1 edits
Original
R1 · understanding
No, 1.[mismatch1]: The woman on the left is wearing a solid red top, not a red top with a blue design.
R2 · edit
Edit: Based on the instruction: 1.[mismatch1]: The woman on the left is wearing a solid red top, not a red top with a blue design., please perform image editing task.
R3 · understanding
No.
Target caption: In the vibrant, retro disco scene, three animated characters dance energetically on a colorful, reflective floor, now illuminated by a shimmering disco ball suspended above. The woman on the left wears red shorts and a solid red top, and a sleek turntable at her feet. Glowing light trails and a neon-lit DJ booth enhance the 1970s atmosphere, with silhouetted dancers in the background adding to the lively energy.
UniWorld-V1#22 Rounds 10 Und(g2u) 0.909Gen(u2g) 0.625Unify 0.767
Initial instruction: Alter the color of the coffee to green and remove the latte art heart design.
U2GUnderstanding-guided Generation: edit → check edit vs instruction → re-edit4 rounds · 2 edits
Original
R1 · edit
Edit: Alter the color of the coffee to green and remove the latte art heart design.
R2 · understanding
No, Alter the color of the coffee to green and remove the latte art heart design.
R3 · edit
Edit: Alter the color of the coffee to green and remove the latte art heart design.
R4 · understanding
No, the color of the coffee has been altered to green, but the latte art heart design has not been removed.
G2UGeneration-guided Understanding: detect image-vs-caption mismatches → edit → re-detect10 rounds · 5 edits
Original
R1 · understanding
No, 1.[mismatch1]: The caption describes a vibrant green coffee, while the image shows a brown coffee with latte art. 2.[mismatch2]: The caption mentions a heart-shaped design, which is not present in the image.
R2 · edit
Edit: Based on the instruction: 1.[mismatch1]: The caption describes a vibrant green coffee, while the image shows a brown coffee with latte art. 2.[mismatch2]: The caption mentions a heart-shaped design, which is not present in the image., please perform image editing task.
R3 · understanding
No, 1.[mismatch1]: The caption describes a vibrant green coffee, but the image shows a light brown coffee with latte art. 2.[mismatch2]: The caption mentions a heart-shaped design, but the image shows a heart-shaped latte art.
R4 · edit
Edit: Based on the instruction: 1.[mismatch1]: The caption describes a vibrant green coffee, but the image shows a light brown coffee with latte art. 2.[mismatch2]: The caption mentions a heart-shaped design, but the image shows a heart-shaped latte art., please perform image editing task.
R5 · understanding
No, 1.[mismatch1]: The image shows a cup with latte art, not a smooth, uniform green surface with fine bubbles. 2.[mismatch2]: The caption mentions a white ceramic cup, but the image shows a cup with a light-colored liquid, not white.
R6 · edit
Edit: Based on the instruction: 1.[mismatch1]: The image shows a cup with latte art, not a smooth, uniform green surface with fine bubbles. 2.[mismatch2]: The caption mentions a white ceramic cup, but the image shows a cup with a light-colored liquid, not white., please perform image editing task.
R7 · understanding
No, 1.[mismatch1]: The caption mentions a white ceramic cup filled with vibrant green coffee, but the image shows a light brown ceramic cup with a light brown coffee, not a white cup with green coffee. 2.[mismatch2]: The caption states there is no latte art visible, but the image shows a heart-shaped latte art design on the coffee.
R8 · edit
Edit: Based on the instruction: 1.[mismatch1]: The caption mentions a white ceramic cup filled with vibrant green coffee, but the image shows a light brown ceramic cup with a light brown coffee, not a white cup with green coffee. 2.[mismatch2]: The caption states there is no latte art visible, but the image shows a heart-shaped latte art design on the coffee., please perform image editing task.
R9 · understanding
No, 1.[mismatch1]: The image shows a cup with latte art, not a smooth, uniform green surface with fine bubbles. 2.[mismatch2]: The caption mentions a white ceramic cup, but the image shows a cup with a different color.
R10 · edit
Edit: Based on the instruction: 1.[mismatch1]: The image shows a cup with latte art, not a smooth, uniform green surface with fine bubbles. 2.[mismatch2]: The caption mentions a white ceramic cup, but the image shows a cup with a different color., please perform image editing task.
Target caption: A white ceramic cup filled with vibrant green coffee sits on a dark wooden surface, with no latte art visible—only a smooth, uniform green surface with fine bubbles. The heart-shaped design has been completely removed, leaving a plain, unadorned drink. The warm, earthy tones of the original are replaced by the striking green hue, contrasting with the dark wood and light wooden edge on the left.
UniWorld-V1#184 Rounds 9 Und(g2u) 0.8Gen(u2g) 0.625Unify 0.713
Initial instruction: Add a cherry on top of the chocolate balls, alter the ingredient of the cake layers from chocolate to fruit.
U2GUnderstanding-guided Generation: edit → check edit vs instruction → re-edit6 rounds · 3 edits
Original
R1 · edit
Edit: Add a cherry on top of the chocolate balls, alter the ingredient of the cake layers from chocolate to fruit.
R2 · understanding
No, Add a cherry on top of the chocolate balls The instruction to add a cherry on top of the chocolate balls is not satisfied in the edited image. The original image shows chocolate balls on top of the cake, but the edited image does not have any cherry on top of the chocolate balls.
R3 · edit
Edit: Add a cherry on top of the chocolate balls The instruction to add a cherry on top of the chocolate balls is not satisfied in the edited image. The original image shows chocolate balls on top of the cake, but the edited image does not have any cherry on top of the chocolate balls.
R4 · understanding
No, Add a cherry on top of the chocolate balls, alter the ingredient of the cake layers from chocolate to fruit.
R5 · edit
Edit: Add a cherry on top of the chocolate balls, alter the ingredient of the cake layers from chocolate to fruit.
R6 · understanding
No, Add a cherry on top of the chocolate balls, alter the ingredient of the cake layers from chocolate to fruit.
G2UGeneration-guided Understanding: detect image-vs-caption mismatches → edit → re-detect9 rounds · 4 edits
Original
R1 · understanding
No, 1.[mismatch1]: The caption describes a fruit cake, but the image shows a chocolate cake. 2.[mismatch2]: The caption mentions a single red cherry, but the image does not show a cherry.
R2 · edit
Edit: Based on the instruction: 1.[mismatch1]: The caption describes a fruit cake, but the image shows a chocolate cake. 2.[mismatch2]: The caption mentions a single red cherry, but the image does not show a cherry., please perform image editing task.
R3 · understanding
No, 1.[caption mentions a three-layer fruit cake, but the image shows a single layer cake with a glossy topping, possibly chocolate ganache, and a red fruit filling in the center]:[The caption should be revised to reflect the single layer cake with a glossy topping and a red fruit filling].
R4 · edit
Edit: Based on the instruction: 1.[caption mentions a three-layer fruit cake, but the image shows a single layer cake with a glossy topping, possibly chocolate ganache, and a red fruit filling in the center]:[The caption should be revised to reflect the single layer cake with a glossy topping and a red fruit filling]., please perform image editing task.
R5 · understanding
No, 1.[mismatch1]: The image shows a chocolate cake with a glossy chocolate glaze, not a fruit cake with a meringue topping. 2.[mismatch2]: The image does not show a single red cherry on the meringue, but rather a glossy chocolate glaze covering the top of the cake.
R6 · edit
Edit: Based on the instruction: 1.[mismatch1]: The image shows a chocolate cake with a glossy chocolate glaze, not a fruit cake with a meringue topping. 2.[mismatch2]: The image does not show a single red cherry on the meringue, but rather a glossy chocolate glaze covering the top of the cake., please perform image editing task.
R7 · understanding
No, 1.[caption mentions a three-layer fruit cake, but the image shows a single layer cake with a chocolate topping and a cherry on top]:[edit the caption to describe a single layer cake with a chocolate topping and a cherry on top].
R8 · edit
Edit: Based on the instruction: 1.[caption mentions a three-layer fruit cake, but the image shows a single layer cake with a chocolate topping and a cherry on top]:[edit the caption to describe a single layer cake with a chocolate topping and a cherry on top]., please perform image editing task.
R9 · understanding
No
Target caption: A three-layer fruit cake slice sits on a white plate, its bright, fluffy layers separated by creamy chocolate frosting. The top is crowned with a glossy, slightly toasted white meringue, adorned with a cluster of shiny chocolate beads and a single red cherry. The cake is set against a dark, blurred background, highlighting its vibrant colors and rich textures.
SEED-X-17B#49 Rounds 2 Und(g2u) 0.0Gen(u2g) 0.542Unify 0.271
Initial instruction: Remove the man's cowboy hat, and replace the red vintage car with a blue vintage car.
U2GUnderstanding-guided Generation: edit → check edit vs instruction → re-edit2 rounds · 1 edits
Original
R1 · edit
Edit: Remove the man's cowboy hat, and replace the red vintage car with a blue vintage car.
R2 · understanding
Yes
G2UGeneration-guided Understanding: detect image-vs-caption mismatches → edit → re-detect1 rounds · 0 edits
Original
R1 · understanding
No
Target caption: A man in a blue plaid shirt and sunglasses, without hat, stands beside a classic blue sports car. The sports car is parked in a lot surrounded by modern vehicles, including a rainbow-hued sedan and a red car with its hood open, under a bright, clear sky.
SEED-X-17B#191 Rounds 2 Und(g2u) 0.0Gen(u2g) 0.542Unify 0.271
Initial instruction: Remove the white fur on the ground near the wheel, and replace the green dress with a warrior clothing.
U2GUnderstanding-guided Generation: edit → check edit vs instruction → re-edit2 rounds · 1 edits
Original
R1 · edit
Edit: Remove the white fur on the ground near the wheel, and replace the green dress with a warrior clothing.
R2 · understanding
Yes
G2UGeneration-guided Understanding: detect image-vs-caption mismatches → edit → re-detect1 rounds · 0 edits
Original
R1 · understanding
No
Target caption: A woman in a warrior clothing and white fur cuffs stands center stage, her red hair adorned with a floral crown. To her left is a rustic wooden cart decorated with festive garlands, fruits, a small Christmas tree; there is no other things on the ground. In the background, a man in a green coat and hat observes, with a dark, silhouetted cityscape behind them, maintaining the theatrical holiday ambiance.