RPGDXThe center of Indie-RPG gaming
Not logged in. [log in] [register]
 
 
Post new topic Reply to topic  
View previous topic - View next topic  
Author Message
Mattias Gustavsson
Mage


Joined: 10 Nov 2007
Posts: 457
Location: Royal Leamington Spa, UK

PostPosted: Tue Jun 02, 2009 6:01 pm    Post subject: Using strings for IDs [quote]

Sometimes, it's nice to be able to identify things by name, rather than using a number or an enum.

The problem is, that if you're not careful, using strings can take a lot of memory, and LOADS of CPU-power, with all the stricmp's. When I was transferred to the Tycoon City project, with the task of speeding up the game (which was *really* slow at that point), it turned out they were using about 400MB for storing strings (of which most was used as ID:s, and was mostly duplicates of duplicates), and that string comparisons and string copying was right at the top of the profiling results. I won't go into how I went about solving that particular problem - it as late in the project, and things turned out the way they turned out - but it got significantly faster and smaller, that's for sure.

But ever since then, I'm always a bit wary about using strings - it can easily get out of hand, and it feels wrong to waste both memory and CPU-power on something if you don't have to. So, I thought I'd share my own system for solving this problem, extracted from my (recently cleaned up) Pixie Game Engine. It's a solid, fast solution, ready to just drop into any project, and consists of only two classes.

At its core is a class I call StringId, which is easily created by:

Code:
StringId myStringIdVariable("myOwnIdNameHere");


Which internally results in the string "myOwnIdNameHere" being located in a global string table (non-case-sensitive), or inserted if it's not already added. The lookup is done through a pretty fast hash-table. In the StringId object is stored a pointer to the shared string, so you can do:

Code:
if (myStringIdVariable==anotherStringIdVariable)


and know that it will result, internally, in just a comparison of two pointers. Nice and fast. And takes less memory too, as a string is only stored in one place - the global string table.

There's also a couple of macros defined (they're nifty at times), strSwitch and strCase, which can be used to compare StringId's in a way similar to switch-case statements:

Code:
        strSwitch (myFruitTypeStringId)
            {
            strCase(Apples)
                {
                // code to do stuff here
                }
 
            strCase(Oranges)
                {
                // code to do other stuff here
                }
            }
 


Which is the same as doing this, but looks a bit nicer (though that is, of course, a matter of taste :D )
Code:
        static StringId applesId("Apples");
        if (myFruitTypeStringId==applesId)
            {
            // code to do stuff here
            }
 
        static StringId orangesId("Oranges");
        if (myFruitTypeStringId==orangesId)
            {
            // code to do other stuff here
            }
 
 


The code is well-written and well-commented (for once), and can be downloaded here. It is C++, and as usual it's public domain - use it anyway you like :D
_________________
www.mattiasgustavsson.com - My blog
www.rivtind.com - My Fantasy world and isometric RPG engine
www.pixieuniversity.com - Software 2D Game Engine
Back to top  
DeveloperX
202192397


Joined: 04 May 2003
Posts: 1626
Location: Decatur, IL, USA

PostPosted: Tue Jun 02, 2009 8:52 pm    Post subject: [quote]

Nice. finally some explanation to that cryptic mess XD
I'll have a look through it sometime later. :)
_________________
Principal Software Architect
Rambling Indie Games, LLC

See my professional portfolio
Back to top  
Ninkazu
Demon Hunter


Joined: 08 Aug 2002
Posts: 945
Location: Location:

PostPosted: Wed Jun 03, 2009 12:06 am    Post subject: [quote]

Why not have string ids in the global table also have an integer id (a simple index) so when a script is loaded you can replace the ids at load time and have truly constant lookup rather than amortized? Or is this so you can hardcode using string ids rather than use scripts? *shudder*
Back to top  
Rainer Deyke
Demon Hunter


Joined: 05 Jun 2002
Posts: 672

PostPosted: Wed Jun 03, 2009 2:17 am    Post subject: [quote]

This seems to be a fairly common pattern. It looks like my own identifier class, and bears more than a passing resemblance to boost::flyweight.

It looks like you're preserving the hash value of the string for when the StringId is used as the key in another hash map. I think this is unnecessary. Since StringId already contains a pointer to a shared string, why not use that pointer directly as a hash value? i.e.
Code:
unsigned int StringId::GetHash() const {
  return reinterpret_cast<unsigned>(this->isString_);
}
This would cut the size of your StringId objects in half.
Back to top  
RedSlash
Mage


Joined: 12 May 2005
Posts: 331

PostPosted: Wed Jun 03, 2009 2:40 am    Post subject: [quote]

Quote:
why not use that pointer directly as a hash value?

How will he determine the id of a string with a different pointer address, but equal to a value of a string in the global table without doing a linear search on the global table using only the pointer address? Sorry, I didn't go through his code, but I suspect Matt's got the right idea.
Back to top  
Rainer Deyke
Demon Hunter


Joined: 05 Jun 2002
Posts: 672

PostPosted: Wed Jun 03, 2009 4:22 am    Post subject: [quote]

RedSlash wrote:
Quote:
why not use that pointer directly as a hash value?

How will he determine the id of a string with a different pointer address, but equal to a value of a string in the global table without doing a linear search on the global table using only the pointer address? Sorry, I didn't go through his code, but I suspect Matt's got the right idea.


Because he already uses shared strings.

Matt's code:
- Calculate hash value.
- Use hash value to look up shared string.
- Store the pointer to the shared string.
- Also store the calculated hash value for future lookups using the StringId as key.

My suggestion:
- Calculate hash value.
- Use hash value to look up shared string.
- Store the pointer to the shared string.
- Use the pointer to the shared string as hash value for future lookups using the StringId as key.
Back to top  
Mattias Gustavsson
Mage


Joined: 10 Nov 2007
Posts: 457
Location: Royal Leamington Spa, UK

PostPosted: Wed Jun 03, 2009 10:24 am    Post subject: [quote]

Yes, Mr. Deyke is quite right - you could just as well use the string pointer as a hash value.

As pointed out, my code does:
- Calculate hash value.
- Use hash value to look up shared string.
- Store the pointer to the shared string.
- Also store the calculated hash value for future lookups using the StringId as key. I want to clarify, that these future lookups has nothing to do with the StringId system itself - at this point, the shared string is already found. But sometimes, I make use of another class, called HashTable, which can store and quickly look up elements based on a hash value. In those cases, I can reuse the hash table stored in the StringId class, rather than recalculate it.

I've found that the hashing function gives me a significantly more even distribution (less collisions) in my HashTable than using the shared string pointers, so that's my reasoning for storing it along with the StringId.

But I realize that this is not useful as part of this easy-drop-in version of the StringId system, so I've uploaded another version with that bit taken out - thanks to Rainer for pointing it out :-)

Download the version without the extra hash value here
_________________
www.mattiasgustavsson.com - My blog
www.rivtind.com - My Fantasy world and isometric RPG engine
www.pixieuniversity.com - Software 2D Game Engine
Back to top  
tcaudilllg
Dragonmaster


Joined: 20 Jun 2002
Posts: 1731
Location: Cedar Bluff, VA

PostPosted: Thu Jun 04, 2009 11:44 am    Post subject: [quote]

That is insightful. Good job.
Back to top  
Hajo
Demon Hunter


Joined: 30 Sep 2003
Posts: 779
Location: Between chair and keyboard.

PostPosted: Fri Jun 05, 2009 8:47 am    Post subject: [quote]

Not sure how useful this is, but Java String class has a method called "intern" that puts a string into a global table, so that you can compare all interned string objects by their references, not needing to compare the contents.

All string literals in Java programs are automatically interned.

This seems to be much like what Matthias was doing. It might be a sign that his idea was good, or a sign that proper languages bring the feature with them by default ;)
Back to top  
Mattias Gustavsson
Mage


Joined: 10 Nov 2007
Posts: 457
Location: Royal Leamington Spa, UK

PostPosted: Sun Jun 14, 2009 7:28 pm    Post subject: [quote]

After a lot of discussions, testing and optimizing (you can read the whole disussion here if you understand Swedish), I have a new version of the system, which is even faster and takes less memory (and makes a lot less dynamic allocations):

http://www.colossusentertainment.com/forumref/pixie_stringid_v2.zip
_________________
www.mattiasgustavsson.com - My blog
www.rivtind.com - My Fantasy world and isometric RPG engine
www.pixieuniversity.com - Software 2D Game Engine
Back to top  
cowgod
Wandering Minstrel


Joined: 22 Nov 2005
Posts: 114
Location: Pittsburgh, USA

PostPosted: Sat Jun 20, 2009 3:08 am    Post subject: [quote]

If you want it to be faster (especially for large string tables), you could try using a "trie" instead. You can get the general idea of what one is by looking it up in Wikipedia. The algorithms book I read that mentioned tries (Algorithms in C++, Volumes 1-4, 3rd edition) included a slight variation where you only have nodes up to the point at which there's no chance of the result being a different string.

Tries have an interface similar to Hash Maps, but you don't need to deal with hash codes.

Although speed might be improved, the memory usage will most likely be greater.
Back to top  
cowgod
Wandering Minstrel


Joined: 22 Nov 2005
Posts: 114
Location: Pittsburgh, USA

PostPosted: Sat Jun 20, 2009 3:08 am    Post subject: [quote]

If you want it to be faster (especially for large string tables), you could try using a "trie" instead. You can get the general idea of what one is by looking it up in Wikipedia. The algorithms book I read that mentioned tries (Algorithms in C++, Volumes 1-4, 3rd edition) included a slight variation where you only have nodes up to the point at which there's no chance of the result being a different string.

Tries have an interface similar to Hash Maps, but you don't need to deal with hash codes.

Although speed might be improved, the memory usage will most likely be greater.
Back to top  
cowgod
Wandering Minstrel


Joined: 22 Nov 2005
Posts: 114
Location: Pittsburgh, USA

PostPosted: Sat Jun 20, 2009 3:08 am    Post subject: [quote]

If you want it to be faster (especially for large string tables), you could try using a "trie" instead. You can get the general idea of what one is by looking it up in Wikipedia. The algorithms book I read that mentioned tries (Algorithms in C++, Volumes 1-4, 3rd edition) included a slight variation where you only have nodes up to the point at which there's no chance of the result being a different string.

Tries have an interface similar to Hash Maps, but you don't need to deal with hash codes.

Although speed might be improved, the memory usage will most likely be greater.
Back to top  
Mattias Gustavsson
Mage


Joined: 10 Nov 2007
Posts: 457
Location: Royal Leamington Spa, UK

PostPosted: Mon Jun 22, 2009 6:44 pm    Post subject: [quote]

Well, we did a basic implementation of tries, and it was a lot slower. I'm sure it could be sped up quite a lot though, if proper care was taken when implementing it (ie. not using STL/Boost for example), but the current StringId implementation is pretty fast, so I don't feel a need to do change the way it works.

The timings looked like this, though it doesn't say much really, as it's comparing my optimized StringId implementation against the unoptimized Tries implementation


however, if someone wants to try and see if they can make their own faster implementation (which can be a great learning experience), there's a test-bed (including the basic Tries implementation) here:

stringid testbed (note: the testbed requires boost, for the Tries implementation, and contains an earlier version of StringId).

it creates and then locates 100,000 strings, and can either randomly generate them (so all of them is different) or read them from a text-file (the full text of "dr. jekyll and mr. hyde", which is included), resulting in lots of duplicates, which might be a more realistic test case.

I've found that it can be really quite interesting to try these sort of things, as you usually learn quite a bit about what's fast and what isn't.
_________________
www.mattiasgustavsson.com - My blog
www.rivtind.com - My Fantasy world and isometric RPG engine
www.pixieuniversity.com - Software 2D Game Engine
Back to top  
Post new topic Reply to topic Page 1 of 1 All times are GMT
 



Display posts from previous:   
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum