Jump to content
(Public Beta) HyperSpin 2 is now available for everyone ×

(FuzzyRom) Some love for our programmers!


Recommended Posts

Posted

Over the past little while here I have noticed that there are not many utilities available to the programmers and one of the most sorely overlooked utilities is a nice fuzzy matching library. That is why I have created FuzzyRom (sorry if any other product out there has the same name was not intentional.) FuzzyRom is an application extension that software developers can add into their program to allow for a more defined way to match two file names together and yield an integer result (from 0 - 100.) FuzzyRom initially started as me asking from help from our dear friend emb whom sent me a basic file on fuzzy matching. After reading the source code I decided to get a little more involved and read up on my regular expressions, adding and removing values, substituting fields and so on; out of this brainstorm came FuzzyRom.

How to use:

To use FuzzyRom in your application you must add a reference to the compiled DLL file from within Visual Studio, then update your application declarations accordingly:

using FuzzyRom;

Once FuzzyRom has been added you can create a new FuzzyRom object simply like so:

var romMatcher = new RomMatcher();

Now you can start setting up your designations within the code itself, FuzzyRom is extensible and uses list objects to hold variables in which you would like to remove from a string. We will now go over the library functions, and their operation.

DefaultWordList:
Type: boolean.

Default word list contains key combinations which are automatically stripped from any object which is ran through the sanitizer; these combinations are as follows:


  • [*=5]'\'
    [*=5]'-'
    [*=5]'!'
    [*=5]','
    [*=5]'the'
    [*=5]'of'
    [*=5]'in'
    [*=5]'to'
    [*=5]'&' replaced with 'and'

The objects listed above are removed (or modified) from a string if DefaultWordList is enabled. You can disable the default word list by using this option after you have made you declaration romMatcher.DefaultWordList = false. This option is enabled by default.

RemoveTags:
Type: boolean.

RemoveTags can be used to strip container tags from a string value. Tags are are commonly found after a ROM's proper file name, they are used as identifiers as to the manufacturer of a game, whether or not the ROM is a bad dump, the year it was produced and what language the game is in. While this is a good thing in and of itself when attempting to match files they more often than not are more cause for head ache and trouble then they are worth.

If this option is used then you can see the results below:

Super Mario Bros. 3 (USA) (!) [En, De, Es, Ja] <- before translation | after translation -> Super Mario Bros. 3

Any tag found starting with '(' or '[' will be removed from the file name leaving only a pure filename. You can disable the tag stripper by using this option after you have made you declaration romMatcher.RemoveTags = false. This option is enabled by default.

MatchCase:
Type: boolean.

MatchCase is used whenever you want case explicit matching done for whatever reason. If MatchCase is on then file names will not be lowered and results will be dramatically affected as a result:

MatchCase On:

Super Mario Bros. 3 == Super Mario Bros. 3

MatchCase Off:

Super Mario Bros. 3 == super mario bros. 3

While the whole point of case matching may seem somewhat trivial please use it with a sense of caution because to a maching 'M' is not the same as 'm' and therefore will yield much different results. I personally do not see a need to turn this option on ever but it's your choice. You can enable the case matcher by using this option after you have made you declaration romMatcher.MatchCase = true. This option is disabled by default.

RemoveFileExtension:
Type: boolean.

RemoveFileExtension will attempt to strip a files given extension from the name whenever it passes through the sanitize process. If an extension is detected it will automatically be removed from the purified string.

RemoveFileExtension On:

Super Mario Bros. 3 (USA).nes == Super Mario Bros. 3 (USA)

RemoveFileExtension Off:

Super Mario Bros. 3 (USA).nes == Super Mario Bros. 3 (USA).nes

The overall purpose of this function is to eliminate the need for parsing a file name before it is ran through the sanitize process. This option will come in especially handy if you are making an imaging tool or another application which uses file names as a primary source of variables. You can enable the extension stripper by using this option after you have made you declaration romMatcher.RemoveFileExtension = true. This option is disabled by default.

UseCustomWordList:
Type: boolean.

The UseCustomWordList tells the extension as to whether or not you wish to use a custom strip list. Much like the DefaultWordList you can define several words (1 ~ 1000) in which to strip from a file name. Please see below for CustomWordList. You can enable the custom word list by using this option after you have made you declaration romMatcher.UseCustomWordList = true.

CustomWordList:
Type: List<string>.

The custom word list is the container for all of your variables in which you wish to remove during the string sanitize process. This list has been set to public so that you can add variables to the list, remove variables from the list or search the list.

Please Note:
UseCustomWordList must be enabled otherwise CustomWordList will be ignored.


  • [*=5]Adding a word: romMatcher.CustomWordList.Add("Super");
    [*=5]Clearing the list: romMatcher.CustomWordList.Clear();

Any variables inside of this container will be stripped from a file name during the sanitize process, please try not to add to many words to the list (I would suggest ten at most) otherwise the chances for a false positive match will be greatly increased.

Full Useage:

            const string word1 = "Super Mario Karfgfrt 64 (USA)";
           const string word2 = "Super Mario Kart 64 (1996) (!) [En, Ja, De].zip";


           var matcher = new RomMatcher {RemoveFileExtension = true};


           matcher.DefaultWordList


           if (matcher.MatchRoms(word1, word2) >= 75)
           {
               MessageBox.Show(matcher.MatchRoms(word1, word2).ToString());
               MessageBox.Show(@"A match has been found.");
           }

Well, there it is my little API write up on how to use FuzzyRom. I hope that someone out there can find a good use for it as it truly is a powerful little application tool for its size (13kb.) It can be extended without end through several of the functions that I have included into the extension. Personally I would like to thank Emb for the inspiration to make this tool, and for giving me some reading materials as well. I have attached the download to this thread, enjoy guys!

-RLH

FuzzyRom.zip

Posted

This API is awesome. I will be using this in the near future. There are many systems out there were CRCs have not yet been calculated.

Do we have the ability to overload or update the sanitizer? I have experienced false positive matches in other applications where a common mismatch occurs because some games use roman numerals (e.g "II" instead of 2).

Another common thing that triggers false positives is that some games might be backwards in their renaming, e.g. (The Lost Caverns - Pitfall 2, vs Pitfall 2 - The Lost Caverns)

A third thing that has generated false positives for me is the single quote (') operator. (e.g. Madden '98 will sometimes grab Madden 99 instead of Madden 98 based on the renaming tool).

Also might want to consider sanitizing "_" as well on default.

  • 11 months later...
  • 2 weeks later...

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...