LeCom wrote:Oh good then I'm not the only one seeing faults in the development of IB (not saying that it's crap though). GM and his team just have wrong priorities. They seem to emphasize on miscellaneous or unimportant things, while IB itself has nothing to offer as a game. It's more like a game engine, but very complex and hard to mod.I think that's right, well put. Complex and Hard to mod is definetly a issue when someone doesn't have much experience with things.
VoxelWar Discussion thread
[Warp-UK] FN-FAL, MK11, G36C, VSS Vintorez, AK12, ACE 52, AKs74u
Look for more (Inc Sounds)
Code: Select all
http://i.imgur.com/1dlFJ0w.jpg[12:43] <@LeCom> no I suck dick
Anyone happen to have a copy of VoxelWar's source?
It looks like the forums and site are down....
It looks like the forums and site are down....
Uhh, did he make his own raycasting engine from scratch?
longbyte1 wrote:Uhh, did he make his own raycasting engine from scratch?If you mean me, I wrote both from scratch, yes, how else.
PS: The SVO 4DOF raycaster is kind of finished and is actually shitty on CPUs, something like 10 FPS for 320x240 on a 2.4 GHz CPU and DDR3 RAM (single thread only and infinite visibility tho') and still a view rendering glitches. But it shows that it's possible and that it's not far away from our 60 FPS Full HD target(and consider how fast GPUs are in comparison).
-
Marisa Kirisame
Deuced Up - Posts: 152
- Joined: Sat Sep 21, 2013 10:52 pm
LeCom wrote:Port it to GLSL for great justice performancelongbyte1 wrote:Uhh, did he make his own raycasting engine from scratch?If you mean me, I wrote both from scratch, yes, how else.
PS: The SVO 4DOF raycaster is kind of finished and is actually shitty on CPUs, something like 10 FPS for 320x240 on a 2.4 GHz CPU and DDR3 RAM (single thread only and infinite visibility tho') and still a view rendering glitches. But it shows that it's possible and that it's not far away from our 60 FPS Full HD target(and consider how fast GPUs are in comparison).
Also, use beamcasting starting with e.g. 16x16 regions, then recursing down to suit
Note I haven't done the latter yet but if you combine the two you'd probably use a mipmap pyramid for your texture and do several FBO stages
longbyte1 wrote:We should move on to create our own replicas of AoS instead (call them "tributes" if you'd like): we learn far more and take far less time making it ourselves than trying to reverse engineer a heavily obfuscated, hand-optimized work of art.
Tbh the only way I ever programmed a GPU was via SDL2's rendering API because I was always a voxel and raycasting/tracing fanboy. But fine, I think my hardware supports PND3D's GLSL mode, so why not try it.
At the moment I AM using a mipmap pyramid. Couldn't figure out anything else that would be good in a game with changing, and possibly complex terrain.
I had confirmed with some tests that the main reason for the slowness was the fact that the CPU has to cycle through every single pixel. That's I'm figuring out how to implement an algorithm that casts like only one single ray for the upper mip levels, and then divides into several new rays when going down in the level.
I see, people are still asking for source. Dunno if I even can get the external drive containing it, but may I ask for what purpose at least? Note that it's pretty much what used to be called spaghetti code back then, aka weird and mostly make-shift stuff aka it's better to rewrite it than continuing to work with it (also the problem with Voxlap).
At the moment I AM using a mipmap pyramid. Couldn't figure out anything else that would be good in a game with changing, and possibly complex terrain.
I had confirmed with some tests that the main reason for the slowness was the fact that the CPU has to cycle through every single pixel. That's I'm figuring out how to implement an algorithm that casts like only one single ray for the upper mip levels, and then divides into several new rays when going down in the level.
I see, people are still asking for source. Dunno if I even can get the external drive containing it, but may I ask for what purpose at least? Note that it's pretty much what used to be called spaghetti code back then, aka weird and mostly make-shift stuff aka it's better to rewrite it than continuing to work with it (also the problem with Voxlap).
-
Marisa Kirisame
Deuced Up - Posts: 152
- Joined: Sat Sep 21, 2013 10:52 pm
If you want to improve performance, throw more cores and SIMD lanes at it ;) OpenMP makes the "more cores" thing easy. Just make sure you do something like this:
The SIMD approach works better if you do something like this:
Code: Select all
and not this:int y;
#pragma omp parallel for
for(y = 0; y < height; y++)
{
int x;
Uint32 *dest = ((Uint32 *)(screen->pixels + screen->pitch*y));
for(x = 0; x < width; x++, dest++)
{
Code: Select all
as the threads will share the same x + dest and will dissolve into a scattered mess.int x, y;
Uint32 *dest = (Uint32 *)screen->pixels;
#pragma omp parallel for
for(y = 0; y < height; y++, dest += (screen->pitch-width)/4)
{
for(x = 0; x < width; x++, dest++)
{
The SIMD approach works better if you do something like this:
Code: Select all
Rather than:for(x = 0; x < width; x += 4)
{
__m128 posx, posy, posz;
__m128 velx, vely, velz;
__m128 time;
...
posx = _mm_add_ps(posx, _mm_mul_ps(time, velx));
posy = _mm_add_ps(posy, _mm_mul_ps(time, vely));
posz = _mm_add_ps(posz, _mm_mul_ps(time, velz));
...
}
Code: Select all
Oh, and it's best to code that sort of stuff in C, as even I am not better at coding assembly than a C compiler.
for(x = 0; x < width; x++)
{
__m128 pos;
__m128 vel;
float time;
...
pos = _mm_add_ps(pos, _mm_mul_ps(_mm_set1_ps(time), vel));
...
}
LeCom wrote:That's I'm figuring out how to implement an algorithm that casts like only one single ray for the upper mip levels, and then divides into several new rays when going down in the level.Yep, that's beamtracing.
longbyte1 wrote:We should move on to create our own replicas of AoS instead (call them "tributes" if you'd like): we learn far more and take far less time making it ourselves than trying to reverse engineer a heavily obfuscated, hand-optimized work of art.
I'm not relying that much on multithreading. Especially since the speed up per core is only around 70% of a core's power. Then most hardware usually only has 4-8, and changing to beamtracing would make it hard to parallelise (btw any idea why google returns basically nothing usable for beamtracing?). Dunno about SIMD yet, my usage of MMX in the voxlap hack was pretty much of a fail. Btw I don't really get your recommendation, you want me to use 128 bit registers for stuff I can do with floats, instead of doing the actual SIMD thing?
Also, yes, C rules if it comes to these things. I was considering D for some time because of its high-level stuff (compiled+high-level=speed?), but I doubt so now.
Also, yes, C rules if it comes to these things. I was considering D for some time because of its high-level stuff (compiled+high-level=speed?), but I doubt so now.
-
Marisa Kirisame
Deuced Up - Posts: 152
- Joined: Sat Sep 21, 2013 10:52 pm
LeCom wrote:I'm not relying that much on multithreading. Especially since the speed up per core is only around 70% of a core's power. Then most hardware usually only has 4-8,For raytracing I get very close to double speed for two actual cores. For hyperthreading I still see an improvement, although not as much. It still helps.
LeCom wrote:and changing to beamtracing would make it hard to parallelise (btw any idea why google returns basically nothing usable for beamtracing?).Just group the work into, say, 32x32 regions. It will be serial within the regions, but the regions will be traceable in parallel.
LeCom wrote:Dunno about SIMD yet, my usage of MMX in the voxlap hack was pretty much of a fail. Btw I don't really get your recommendation, you want me to use 128 bit registers for stuff I can do with floats, instead of doing the actual SIMD thing?No, I'm saying use the 128-bit registers to operate on 4 rays at once, rather than using them as a 4-float vector.
Also if you arrange the 4 rays into 2x2 blocks it'll be easier to beamtrace and you may also slightly reduce the level of divergence. With that said, _mm_movemask_ps() is also useful (maskmove = copy while masking some values out, movemask = copy the top bit of each value and shove it into an int).
If you're curious and would like to have a nosey around with SSE stuff, the guide you want is the Intel Intrinsics Guide, which used to be a Java program but is now a notably nicer web app: https://software.intel.com/sites/landin ... sicsGuide/
LeCom wrote:Also, yes, C rules if it comes to these things. I was considering D for some time because of its high-level stuff (compiled+high-level=speed?), but I doubt so now.I once wrote a raytracer in C++ which used virtual functions. The fact that I used virtual functions had negligible performance impact, because compilers these days are actually pretty good.
I don't know how good D's compiler is though. If you have no issues with C, then just stick with C.
----
Icarus North wrote:And how tf does this topic have nearly 70 pages, It's like the biggest topic in this forumBiggest active thread. Biggest thread used to be the original Iceball thread but jdrew's shitty clan managed to beat that record by treating the thread like a chat room.
longbyte1 wrote:We should move on to create our own replicas of AoS instead (call them "tributes" if you'd like): we learn far more and take far less time making it ourselves than trying to reverse engineer a heavily obfuscated, hand-optimized work of art.
I could implement a stack for the rays and distribute them among cores and SSE# registers, or other stuff, I know. But at first I want a working and good implementation of the tracer. If one's algorithm isn't good enough, not even SIMD or hyperthreading help.
As for D, there's a GCC port (GDC) and an LLVM implementation. I don't know if LLVM's optimization code is language-independent and therefore as fast as the C one, but there's still GDC. C works pretty fine though, I just had some thoughts that certain high-level stuff in D could be faster than doing it by hand in C (pretty much like the ASM vs. C comparison). However, I don't think so anymore and stick to C anyway.
Edit: page 69
As for D, there's a GCC port (GDC) and an LLVM implementation. I don't know if LLVM's optimization code is language-independent and therefore as fast as the C one, but there's still GDC. C works pretty fine though, I just had some thoughts that certain high-level stuff in D could be faster than doing it by hand in C (pretty much like the ASM vs. C comparison). However, I don't think so anymore and stick to C anyway.
Edit: page 69
I actually once considered using D (heck, it lets you do inline asm!) but wasn't really sure of the performance given the limited assortment of available compilers. I'm glad you made the same consideration too.
longbyte1 wrote:I actually once considered using D (heck, it lets you do inline asm!) but wasn't really sure of the performance given the limited assortment of available compilers. I'm glad you made the same consideration too.I don't really get the link between compiler selection width and language performance. Out of the 3 main D compilers available, one is the reference implementation with focus on reliability and correctness, and the other are ports of the two most important and fastest compilers out there (GCC and LLVM). Plus, benchmarks say that D is almost as fast as C/C++, the main slowdown reason being the shitty garbage collector (that you can disable ofc).
Moreover, inline asm is nothing compared to the freshly added OOP-based SIMD implementation.
LeCom wrote:What are you trying to tell me?longbyte1 wrote:I actually once considered using D (heck, it lets you do inline asm!) but wasn't really sure of the performance given the limited assortment of available compilers. I'm glad you made the same consideration too.I don't really get the link between compiler selection width and language performance. Out of the 3 main D compilers available, one is the reference implementation with focus on reliability and correctness, and the other are ports of the two most important and fastest compilers out there (GCC and LLVM). Plus, benchmarks say that D is almost as fast as C/C++, the main slowdown reason being the shitty garbage collector (that you can disable ofc).
Moreover, inline asm is nothing compared to the freshly added OOP-based SIMD implementation.
longbyte1 wrote: What are you trying to tell me?Same here
Icarus North wrote: That's basically all it is anyways since people aren't actively discussing and playing it anymore.Can't you just, like, not give a fuck?
LeCom wrote:impossibruuuuuuuuuuuuuuuuuuuuulongbyte1 wrote: What are you trying to tell me?Same hereIcarus North wrote: That's basically all it is anyways since people aren't actively discussing and playing it anymore.Can't you just, like, not give a fuck?
Who is online
Users browsing this forum: No registered users and 19 guests