Wikipedia:Reference desk/Computing: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Line 93: Line 93:
</syntaxhighlight>
</syntaxhighlight>
:::: [[User:Glrx|Glrx]] ([[User talk:Glrx|talk]]) 18:56, 16 January 2024 (UTC)
:::: [[User:Glrx|Glrx]] ([[User talk:Glrx|talk]]) 18:56, 16 January 2024 (UTC)
:::::For the example given, you cannot use Levenshtein distances. The example is CO and CT with the abbreviation Conn. For that, the distance between CO and Conn is 2. The distance between CT and Conn is 3. CO is less distant than CT. So, it will not claim that CT is a better match. It will claim CO is a better match. If you really want to go down this route, you have to do a Levenshtein check between the abbreviation Conn and all full state names. But, you have a string length weakness. Conn to Colorado has a distance of 6. Conn to Conneticut has a distance of 6. If you divide the distance by the max possible distance, Conn to Colorado is 6/8=0.75 and Conn to Conneticut is 6/10=0.60. Now, 0.60 is less than 0.75. But, you will certainly hit more problems. You are much better off making a very complete map of all abbreviations. [[Special:Contributions/12.116.29.106|12.116.29.106]] ([[User talk:12.116.29.106|talk]]) 19:53, 16 January 2024 (UTC)

Revision as of 19:53, 16 January 2024

Welcome to the computing section
of the Wikipedia reference desk.
Select a section:
Want a faster answer?

Main page: Help searching Wikipedia

   

How can I get my question answered?

  • Select the section of the desk that best fits the general topic of your question (see the navigation column to the right).
  • Post your question to only one section, providing a short header that gives the topic of your question.
  • Type '~~~~' (that is, four tilde characters) at the end – this signs and dates your contribution so we know who wrote what and when.
  • Don't post personal contact information – it will be removed. Any answers will be provided here.
  • Please be as specific as possible, and include all relevant context – the usefulness of answers may depend on the context.
  • Note:
    • We don't answer (and may remove) questions that require medical diagnosis or legal advice.
    • We don't answer requests for opinions, predictions or debate.
    • We don't do your homework for you, though we'll help you past the stuck point.
    • We don't conduct original research or provide a free source of ideas, but we'll help you find information you need.



How do I answer a question?

Main page: Wikipedia:Reference desk/Guidelines

  • The best answers address the question directly, and back up facts with wikilinks and links to sources. Do not edit others' comments and do not give any medical or legal advice.
See also:

January 14

How are quantum computers different than normal computers we know?

What exactly will change with the increase in use of Quantum computers? will they completely overtake the currently popular types of computers? Yashrajkarthike (talk) 07:27, 14 January 2024 (UTC)[reply]

As stated above, we can't answer requests for predictions. Have you read our comprehensive article on Quantum computing? Shantavira|feed me 08:50, 14 January 2024 (UTC)[reply]
No, but thanks for reminding me. I will check out the article for more details. Thanks! Yashrajkarthike (talk) 10:04, 14 January 2024 (UTC)[reply]


January 16

Algorithm to match U.S. state names to USPS state codes

Re this mini project, I wish to match a list of USPS state codes (set A) with a list of U.S. state and territory names which has spelling variations (set B) in any order. Set B may have missing items, in which case set A matches [null]. This is an example successful match:

Set A CO, CT, DC, VA, WA, WV
Set B Conn., Wash., Washington (District of Columbia), W. Virginia

yields

Set A Set B
CO [null]
CT Conn.
DC Washington (District of Columbia)
VA [null]
WA Wash.
WV W. Virginia

Can someone please suggest a robust, performant yet simple to program algorithm? The target language is JavaScript.

Thanks,
cmɢʟeeτaʟκ 09:03, 16 January 2024 (UTC)[reply]

@Cmglee: I've re-read this multiple times and can't understand what this program is supposed to do. Are you trying to convert a natural language name to a USPS code? One which could have many variations, such as eg DC being "District of Columbia", "Washington DC", "Washington (District of Columbia)" for example. Is that what you're trying to do? —Panamitsu (talk) 10:28, 16 January 2024 (UTC)[reply]
Example map
Thanks for your reply, @Panamitsu: Sorry if I didn't make it clear. Yes, I effectively wish to convert a natural-language name to a USPS code. It is helped that the names are presented in one file, and each state or territory occurs at most once, so if one name somewhat matches one code, no other name can match that code. Here's my use-case:
I have created a template SVG map showing the states of the USA (and Washington DC). In the SVG code, I refer to the states' shape and labels with the USPS codes AK, AL, AR, AZ etc.
My collaborator, Timeshifter, however, receives data in the following format:
Alabama 41.4
Alaska
Arizona 31.4
Arkansas 43.5
The order of the states or their exact spelling cannot be guaranteed. Some states may be missing data: in the above, I have "Alaska" without a number, but that line could be left out entirely.
The JavaScript can have a lookup table with their nominal spellings, but may need to make matches e.g. using Levenshtein distances etc to find the best mapping. In my contrived example, the lookup table might have "CO → Colorado, CT → Connecticut", so when the data has "Conn.", it figures that "CT" is more likely than "CO".
Does it make sense now? It sounds that this is already a common solved problem in computer science. Thanks, cmɢʟeeτaʟκ 11:38, 16 January 2024 (UTC)[reply]
Your lookup table (which should be a map, not a table) is backwards. Let's assume it is called "map" in your code. If you get "Missouri", you set state=map["Missouri"] because map["Missouri"]="MO". It is likely that many people have made this map for their code. You can make it for yours. You have 51 states (including Washington DC), so you type it in. If you get one you haven't seen before, add it. Because of the way this map is made, you can have map["Colorado"] = "CO", map["Col."] = "CO", map["colarodo"] = "CO", etc... 12.116.29.106 (talk) 16:45, 16 January 2024 (UTC)[reply]
I would not use Levenshtein distances; I would just put common spellings into the object. I would restrict the case. If a match is not found, then add the failing string to the map.
/**
 * Table that maps lowercase state names to USPS abbreviations
 * @type {Object.<string, string>}
 */
const mapStateToUSPS = { "al" : "AL", "alabama" : "AL", "ak" : "AK", "alaska" : "AK", "ca" : "CA", "calif" : "CA", "california" : "CA" }; // etc...

/**
 * Convert a state name to a USPS abbreviation
 * @param name {string} name of the state
 * @returns {?string} the USPS abbreviation
 */
function lookupUSPS(name) {
  // normalize the name
  // could also remove punctuation: "N. Carolina" to "N Carolina"
  var key = name.toLowerCase();

  if (mapStateToUSPS.hasOwnProperty(key))
    return mapStateToUSPS[key];
  else
    return null;
}
Glrx (talk) 18:56, 16 January 2024 (UTC)[reply]
For the example given, you cannot use Levenshtein distances. The example is CO and CT with the abbreviation Conn. For that, the distance between CO and Conn is 2. The distance between CT and Conn is 3. CO is less distant than CT. So, it will not claim that CT is a better match. It will claim CO is a better match. If you really want to go down this route, you have to do a Levenshtein check between the abbreviation Conn and all full state names. But, you have a string length weakness. Conn to Colorado has a distance of 6. Conn to Conneticut has a distance of 6. If you divide the distance by the max possible distance, Conn to Colorado is 6/8=0.75 and Conn to Conneticut is 6/10=0.60. Now, 0.60 is less than 0.75. But, you will certainly hit more problems. You are much better off making a very complete map of all abbreviations. 12.116.29.106 (talk) 19:53, 16 January 2024 (UTC)[reply]