Adam's Lair Forum

game development and casual madness
It is currently 2017/10/23, 06:00

All times are UTC + 1 hour [ DST ]




Post new topic Reply to topic  [ 18 posts ]  Go to page Previous  1, 2
Author Message
PostPosted: 2017/09/20, 22:40 
Site Admin
Site Admin
User avatar

Joined: 2013/05/11, 22:30
Posts: 2019
Location: Germany
Role: Professional
Nice work, good to see you're making progress :1+:

Faithless wrote:
When the debugger is attached and the frame limit is disabled the loading is much faster. So my first question here is: How can the frame limit be disabled on general?


You can disable the frame limit via Settings / Default User Data in the editor, just make sure VSync is off. The thing is, there is currently no way to switch that at runtime, and it may introduce tearing, so probably not a good tradeoff just for the loading screen.

However, if you're bound by waiting for VSync, why not just do N items per frame instead of one? That should solve the problem as well.

Quote:
I used 3025 big hexes for the map which result in approximately 110.000 small hexes. Every big hex has a unique disinguishable texture, so no two big hexes are the same.
Quote:
First, I made some tests with using 512x512 pixel textures for the big hexes. Performance on full view is round about 26 FPS on my machine.
Quote:
Second, I made some tests with using 256x256 pixel textures for the big hexes. No laggs here for zooming and scrolling right from the beginning and frame rate is 57 FPS.


I'm 90% sure that you're GPU limited perf-wise, and 80% sure that the reason is either texture bandwith limits or texture switches. If you're rendering the entire 3025 big hexes, and each of them has a unique 512x512 texture, not only are you paying for 3025 texture switches per frame (which is already not the cheapest thing to do), but you're also pushing up to ~ 3GB of texture data through your rendering every frame just for the terrain. (1 big hex == 512x512x4 byte == 1 MB)

Granted, my bandwidth calculation is probably somewhat wrong because I didn't take mipmapping into account, but that still feels a bit too much for this use case xD

Assuming that I am correct on this, the first thing I'd try for optimization is to reduce big hex tile size to 128x128 or less, reducing the texture memory bandwidth usage from 3GB to about 200MB. Next, now that you've cleared up the raw amount of generated and rendered data, reduce texture switches / state changes by putting all big hexes on a single, shared texture, or at least a very small set (<10) of shared textures.

If you reduce to 128x128, that means that you can switch to the big hex only rendering as soon as one big hex gets smaller than 128x128 on screen without noticeable visual impact. Given your viewport, this means that you'd have to deal with about 90 big hexes rendered as about 3000 small hexes. Assuming you have an efficient hex map renderer by then, this should be doable, although it will show up in the profiler / frame timings. If it turns out to be too much, there should still be ways to deal with this by switching to big hex rendering a bit sooner.

Before you start, if you want to quickly check whether texture bandwidth is a problem, reduce big hex texture size to 1x1 (but keep one per big hex) and see if it gets better, and also by how much. To check the impact of texture state changes, you pretty much have to implement the shared texture thing though, not sure if there's an easy up front test.

As soon as the GPU perf is no longer the limiting factor (again assuming the above is correct), you can start to look at the CPU side of rendering, e.g. submit as many hexes as possible in a single batch, optimize drawing routines, interact with IDrawDevice directly instead of via Canvas like SirePi said, etc.

Edit: Oh, another thing: Duality by default renders using 16x MSAA antialiasing. You can change your default user data settings to only use medium or low quality antialiasing and see if it makes much of a difference for your game.

_________________
Blog | GitHub | Twitter (@Adams_Lair)


Top
 Profile  
 
PostPosted: 2017/09/21, 12:22 
Junior Member
Junior Member

Joined: 2017/07/20, 18:17
Posts: 27
Location: Germany
Role: Hobbyist
Adam wrote:
Nice work, good to see you're making progress :1+:
SirePi wrote:
Whoa, that's some serious stuff you put together :+1: great idea on using async/await as well.. it might be time for me to review my thread-based loading screens as well :D

Thanks, I'll do my best ^^ I don't know how you did the thread-based loading, but I can explain the async/await technique, if you are interessted in.

Adam wrote:
I'm 90% sure that you're GPU limited perf-wise, and 80% sure that the reason is either texture bandwith limits or texture switches. If you're rendering the entire 3025 big hexes, and each of them has a unique 512x512 texture, not only are you paying for 3025 texture switches per frame (which is already not the cheapest thing to do), but you're also pushing up to ~ 3GB of texture data through your rendering every frame just for the terrain. (1 big hex == 512x512x4 byte == 1 MB)

Thanks for the explanation, I didn't know what I have done to my graphic board here ^^

SirePi wrote:
Back on topic, from what I could see in your code, the next step now for you would be to ditch the Canvas-based drawing and start pushing vertices directly to the IDrawDevice. Try and have a peek to Duality's SpriteRenderer class..
Adam wrote:
As soon as the GPU perf is no longer the limiting factor (again assuming the above is correct), you can start to look at the CPU side of rendering, e.g. submit as many hexes as possible in a single batch, optimize drawing routines, interact with IDrawDevice directly instead of via Canvas like SirePi said, etc.

I see doing improvements here may also increase the frame rate. But I am not aware how this is done. I took a look at the SpriteRenderer.Draw() and at Canvas.FillRect() methods and the code there always seem to end with putting the vertices together with one texture into the draw device:
Code:
device.AddVertices(this.customMat, VertexMode.Quads, this.vertices);
Code:
device.AddVertices(this.State.MaterialDirect, VertexMode.Quads, vertices, 4);

So, doing this directly in my code instead of using these classes will really be of an improvement?

Adam wrote:
Edit: Oh, another thing: Duality by default renders using 16x MSAA antialiasing. You can change your default user data settings to only use medium or low quality antialiasing and see if it makes much of a difference for your game.

I checked this. I noticed a very small impact of round about 1 frame reducing the antialiasing quality, so I decided to favor quality over performance in this particular case :)

Adam wrote:
However, if you're bound by waiting for VSync, why not just do N items per frame instead of one? That should solve the problem as well.

Yes, this sounds good, but I again really don't know how to do this in code. My prerendering code looks like this:
Code:
foreach (var terrain in gameBoard.Terrains)
{
  var terrainTexture = new Texture(240, 224,
    TextureSizeMode.Default,
    TextureMagFilter.Nearest,
    TextureMinFilter.Nearest, TextureWrapMode.Clamp, TextureWrapMode.Clamp);

  var renderTarget = new RenderTarget(AAQuality.High, terrainTexture);

  var drawDevice = new DrawDevice();
  drawDevice.Perspective = PerspectiveMode.Flat;
  drawDevice.VisibilityMask = VisibilityFlag.AllGroups | VisibilityFlag.ScreenOverlay;
  drawDevice.RenderMode = RenderMatrix.OrthoScreen;
  drawDevice.Target = renderTarget;
  drawDevice.ViewportRect = new Rect(renderTarget.Width, renderTarget.Height);

  drawDevice.PrepareForDrawcalls();
  canvas = new Canvas(drawDevice);

  for (...)
  {
    canvas.PushState();
    canvas.State.SetMaterial(...);
    canvas.State.TextureCoordinateRect = new Rect(0f, 0f, 1f, 1f);
    canvas.FillRect(...);
    canvas.PopState();
  }

  drawDevice.Render(ClearFlag.All, ColorRgba.TransparentBlack, 10f);
  var terrainMaterial = new Material(DrawTechnique.Mask, ColorRgba.White, terrainTexture).GetContentRef().As<Material>();

  renderTarget.Dispose();

  // ...
}

How are multiple items rendered in one frame here? I don't see how I can define whats rendered in the first and whats rendered in the next frame.

Adam wrote:
Before you start, if you want to quickly check whether texture bandwidth is a problem, reduce big hex texture size to 1x1 (but keep one per big hex) and see if it gets better, and also by how much. To check the impact of texture state changes, you pretty much have to implement the shared texture thing though, not sure if there's an easy up front test.

Like you suggested I tested using a 1x1 pixel texture for the big hexes, but with zero impact on the framerate. I still got 57 FPS on my machine. The second thing I tested are using only one texture state for all tiles, so no texture state changes are made. In this case the frame rate goes up to 92 FPS. I think this means that the bandwidth is no problem with 256x256 big hex textures, but you maybe right to lower the texture size to 128x128, because later there are still more things to render. It may also nice that the game may run smoothly on weaker graphic cards. But the texture state changes are really an issue here, so your proposal of putting a couple of big hexes on one texture should be an improvement.

I also need to implement the rendering of the small hexes on close zoom next to see the performance on this case. Because all small hexes are already on one tilemap texture, I hope the performance of drawing a few thousands of them will be of no trouble :)


Top
 Profile  
 
PostPosted: 2017/09/21, 14:24 
Site Admin
Site Admin
User avatar

Joined: 2013/05/11, 22:30
Posts: 2019
Location: Germany
Role: Professional
Quote:
So, doing this directly in my code instead of using these classes will really be of an improvement?

Canvas is a convenience class on top of IDrawDevice, and the impact is not that big in itself - but multiply this times 3000, it will become noticeable. The fastest thing you can do in terms of data processing is reading / writing directly and continuously on an array of structs. Vertex data are structs, and the drawing device accepts an array of them - so in terms of performance, you can't get much faster than manually filling that array.

Canvas also does some redundant work, as every single of its drawing calls is regarded as an individual operation, so each of them loops back over the device to check transformation settings, scale, every call will use its own separate vertex array, which will be transformed individually. Lots of small things that could be one big thing instead when you need 3000 of them!

Take a look at the TilemapRenderer as an example. It's quite efficient at rendering thousands of tiles with no big impact.

Quote:
How are multiple items rendered in one frame here? I don't see how I can define whats rendered in the first and whats rendered in the next frame.


Duality rendering code is not limited by VSync in itself, only the rendering loop itself is. Since you're the one who wrote the async and synchronization code, I don't see how to define that either xD What I'm responding to is your statement that loading is faster when VSync is disabled (which is the case in profiling and debug mode), so I am assuming that there is some part of your loading sync code that waits until the next frame. This is the part you'll need to adjust, assuming it exists in some form or another.

Quote:
Like you suggested I tested using a 1x1 pixel texture for the big hexes, but with zero impact on the framerate. I still got 57 FPS on my machine. The second thing I tested are using only one texture state for all tiles, so no texture state changes are made. In this case the frame rate goes up to 92 FPS. I think this means that the bandwidth is no problem with 256x256 big hex textures, but you maybe right to lower the texture size to 128x128, because later there are still more things to render.


Yep, seems like going down to 256x256 reduced bandwidth sufficiently on your machine :) I totally agree with you that it would make sense to keep it low, especially with slower machines in mind and that there will be more to render later. Maybe (?) you could even go down further than 128x128, depending on how your zoom / LOD switch is set up. As soon as you're using a shared texture, you could also deviate from the power-of-two sizes.

_________________
Blog | GitHub | Twitter (@Adams_Lair)


Top
 Profile  
 
PostPosted: 2017/09/22, 15:25 
Junior Member
Junior Member

Joined: 2017/07/20, 18:17
Posts: 27
Location: Germany
Role: Hobbyist
Adam wrote:
Canvas is a convenience class on top of IDrawDevice, and the impact is not that big in itself - but multiply this times 3000, it will become noticeable. The fastest thing you can do in terms of data processing is reading / writing directly and continuously on an array of structs. Vertex data are structs, and the drawing device accepts an array of them - so in terms of performance, you can't get much faster than manually filling that array.

Canvas also does some redundant work, as every single of its drawing calls is regarded as an individual operation, so each of them loops back over the device to check transformation settings, scale, every call will use its own separate vertex array, which will be transformed individually. Lots of small things that could be one big thing instead when you need 3000 of them!

Take a look at the TilemapRenderer as an example. It's quite efficient at rendering thousands of tiles with no big impact.

Okay, I see. I may take a look at this later, but like you said it's more important to optimize the graphic board related stuff.

Adam wrote:
Duality rendering code is not limited by VSync in itself, only the rendering loop itself is. Since you're the one who wrote the async and synchronization code, I don't see how to define that either xD What I'm responding to is your statement that loading is faster when VSync is disabled (which is the case in profiling and debug mode), so I am assuming that there is some part of your loading sync code that waits until the next frame. This is the part you'll need to adjust, assuming it exists in some form or another.

Ah, that was the decisive hint. My async/await code was tied to the update call and each update caused only one render call. I changed this so now 50 frames gets rendered per update and preloading is much faster now with VSync enabled/60 FPS limit and even than without VSync. I tested prerendering of 256x256 big hex textures and it now finishes in 7 to 8 seconds. Reducing the big hex texture size will also additionally reduce the loading time.

Okay, so next I will implement the small hex rendering on close scale and after that implementing the switch for close scale to large scale and then putting multiple 128x128 pixel big hexes on shared big hex tilemap textures. I suppose/hope this would be enough optimization ^^


Top
 Profile  
 
PostPosted: 2017/09/23, 21:06 
Junior Member
Junior Member

Joined: 2017/07/20, 18:17
Posts: 27
Location: Germany
Role: Hobbyist
So, I now implemented the small hex rendering. Works on my machine till round about 25.000 small tiles, then framerate drops below 60 FPS. That's totally okay, because I already switch at round about 10.000 small hexes to the big hex rendering. That's approximately the zoom level where the big hexes are smaller than 128 pixel.

I also implemented texture maps for the big hexes. I decided prerendering all big hexes which are on the same quadratical hex row on the same texture. I think thats the most easy way, because while drawing the big hexes they can easily also drawn per row, so texture switches are only needed when drawing the next row. There is still more optimization potential here I know, but the 3.025 hexes are now drawn at 60 FPS:
Image
I was also able to reduce the loading time with the new shared big hex textures to 5 seconds.

I made some test to see that rendering 4.500 big hexes leads to the first frame rate drop:
Image
I am totally fine with this performance and I skip CPU based optimizations for now.

I also tried just for fun 10.000 big hexes and performance goes down to 25 FPS:
Image

So, thanks for the moment, the game map is now performing quite well :)

One last thing: I noticed that rendering on textures larger than 8192 pixel didn't work anymore on my machine. The textures stay empty although big hexes have been rendered on them. There was also no exception. Question: How do I know which are the max texture sizes supported by graphic board / driver?


Top
 Profile  
 
PostPosted: 2017/09/24, 20:01 
Site Admin
Site Admin
User avatar

Joined: 2013/05/11, 22:30
Posts: 2019
Location: Germany
Role: Professional
Nice work, great to see all the progress there :) :1+:

Quote:
One last thing: I noticed that rendering on textures larger than 8192 pixel didn't work anymore on my machine. The textures stay empty although big hexes have been rendered on them. There was also no exception. Question: How do I know which are the max texture sizes supported by graphic board / driver?


There's currently no API to check for those limits on the current machine (that would be a nice feature request / todo actually). I tried a quick web search to see if there are stats about that, but couldn't find any that specific so far, but 8192 seems quite large. My gut feeling would be that you can count on something like 2048 as a safe bet for compatibility.

_________________
Blog | GitHub | Twitter (@Adams_Lair)


Top
 Profile  
 
PostPosted: 2017/09/25, 11:12 
Junior Member
Junior Member

Joined: 2017/07/20, 18:17
Posts: 27
Location: Germany
Role: Hobbyist
Adam wrote:
n the current machine (that would be a nice feature request / todo actually). I tried a quick web search to see if there are stats about that, but couldn't find any that specific so far, but 8192 seems quite large. My gut feeling would be that you can count on something like 2048 as a safe bet for compatibility.

Okay, I created a new issue on github. It's not an important one for me, but so that it is not forgotten.


Top
 Profile  
 
PostPosted: 2017/09/25, 15:24 
Site Admin
Site Admin
User avatar

Joined: 2013/05/11, 22:30
Posts: 2019
Location: Germany
Role: Professional
Neat, thanks! :1+:

_________________
Blog | GitHub | Twitter (@Adams_Lair)


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 18 posts ]  Go to page Previous  1, 2

All times are UTC + 1 hour [ DST ]


Who is online

Users browsing this forum: No registered users and 11 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group