Count the amount of times a word is shown in a string









up vote
0
down vote

favorite












I have a big string where I need to



  1. Convert words starting with an upper-case to a lower-case word so that all words are lower case.



  2. Sort the amount of times a word is shown



    • A word in this sense is a sequence of characters without whitespaces
      or punctuation (!#= etc)


  3. Sort from most frequent word shown and to less frequent word.


I've made a function to read a .txt file and turn it into a string.



But I'm not sure where to go from here and any kind of help with any of the bulletpoints would be greatly appreciated.










share|improve this question



















  • 2




    I didn't downvote, but the person who did, most likely did so because you're asking for someone to write some code without showing what you've tried. As it appears you're new to F#, I've given an answer that explains one way you would solve this problem thinking in a functional way. Hopefully this is a springboard to get you going with F# and solving problems. As F# runs on .NET, you can get inspiration from C# sometimes (like the Punctuation code).
    – DaveShaw
    Nov 9 at 21:32






  • 1




    Ah, okay. I should've said that. I already did try, I was able to split the string so it printed words instead of chars but I thought it would make my post my confusing writing my ugly code. I was also trying to use Regex.Escape to atleast count the occurances of a word, was looking at this: stackoverflow.com/questions/40385154/…
    – jubibanna
    Nov 9 at 21:59






  • 1




    It always helps to post what you have... That code is for when you know what you're looking for and want to count. It could be adopted to solve your problem, but I'd always prefer a groupBy to get a count, if you want to count all instances.
    – DaveShaw
    Nov 9 at 23:59














up vote
0
down vote

favorite












I have a big string where I need to



  1. Convert words starting with an upper-case to a lower-case word so that all words are lower case.



  2. Sort the amount of times a word is shown



    • A word in this sense is a sequence of characters without whitespaces
      or punctuation (!#= etc)


  3. Sort from most frequent word shown and to less frequent word.


I've made a function to read a .txt file and turn it into a string.



But I'm not sure where to go from here and any kind of help with any of the bulletpoints would be greatly appreciated.










share|improve this question



















  • 2




    I didn't downvote, but the person who did, most likely did so because you're asking for someone to write some code without showing what you've tried. As it appears you're new to F#, I've given an answer that explains one way you would solve this problem thinking in a functional way. Hopefully this is a springboard to get you going with F# and solving problems. As F# runs on .NET, you can get inspiration from C# sometimes (like the Punctuation code).
    – DaveShaw
    Nov 9 at 21:32






  • 1




    Ah, okay. I should've said that. I already did try, I was able to split the string so it printed words instead of chars but I thought it would make my post my confusing writing my ugly code. I was also trying to use Regex.Escape to atleast count the occurances of a word, was looking at this: stackoverflow.com/questions/40385154/…
    – jubibanna
    Nov 9 at 21:59






  • 1




    It always helps to post what you have... That code is for when you know what you're looking for and want to count. It could be adopted to solve your problem, but I'd always prefer a groupBy to get a count, if you want to count all instances.
    – DaveShaw
    Nov 9 at 23:59












up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have a big string where I need to



  1. Convert words starting with an upper-case to a lower-case word so that all words are lower case.



  2. Sort the amount of times a word is shown



    • A word in this sense is a sequence of characters without whitespaces
      or punctuation (!#= etc)


  3. Sort from most frequent word shown and to less frequent word.


I've made a function to read a .txt file and turn it into a string.



But I'm not sure where to go from here and any kind of help with any of the bulletpoints would be greatly appreciated.










share|improve this question















I have a big string where I need to



  1. Convert words starting with an upper-case to a lower-case word so that all words are lower case.



  2. Sort the amount of times a word is shown



    • A word in this sense is a sequence of characters without whitespaces
      or punctuation (!#= etc)


  3. Sort from most frequent word shown and to less frequent word.


I've made a function to read a .txt file and turn it into a string.



But I'm not sure where to go from here and any kind of help with any of the bulletpoints would be greatly appreciated.







f#






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 9 at 20:25

























asked Nov 9 at 20:13









jubibanna

3088




3088







  • 2




    I didn't downvote, but the person who did, most likely did so because you're asking for someone to write some code without showing what you've tried. As it appears you're new to F#, I've given an answer that explains one way you would solve this problem thinking in a functional way. Hopefully this is a springboard to get you going with F# and solving problems. As F# runs on .NET, you can get inspiration from C# sometimes (like the Punctuation code).
    – DaveShaw
    Nov 9 at 21:32






  • 1




    Ah, okay. I should've said that. I already did try, I was able to split the string so it printed words instead of chars but I thought it would make my post my confusing writing my ugly code. I was also trying to use Regex.Escape to atleast count the occurances of a word, was looking at this: stackoverflow.com/questions/40385154/…
    – jubibanna
    Nov 9 at 21:59






  • 1




    It always helps to post what you have... That code is for when you know what you're looking for and want to count. It could be adopted to solve your problem, but I'd always prefer a groupBy to get a count, if you want to count all instances.
    – DaveShaw
    Nov 9 at 23:59












  • 2




    I didn't downvote, but the person who did, most likely did so because you're asking for someone to write some code without showing what you've tried. As it appears you're new to F#, I've given an answer that explains one way you would solve this problem thinking in a functional way. Hopefully this is a springboard to get you going with F# and solving problems. As F# runs on .NET, you can get inspiration from C# sometimes (like the Punctuation code).
    – DaveShaw
    Nov 9 at 21:32






  • 1




    Ah, okay. I should've said that. I already did try, I was able to split the string so it printed words instead of chars but I thought it would make my post my confusing writing my ugly code. I was also trying to use Regex.Escape to atleast count the occurances of a word, was looking at this: stackoverflow.com/questions/40385154/…
    – jubibanna
    Nov 9 at 21:59






  • 1




    It always helps to post what you have... That code is for when you know what you're looking for and want to count. It could be adopted to solve your problem, but I'd always prefer a groupBy to get a count, if you want to count all instances.
    – DaveShaw
    Nov 9 at 23:59







2




2




I didn't downvote, but the person who did, most likely did so because you're asking for someone to write some code without showing what you've tried. As it appears you're new to F#, I've given an answer that explains one way you would solve this problem thinking in a functional way. Hopefully this is a springboard to get you going with F# and solving problems. As F# runs on .NET, you can get inspiration from C# sometimes (like the Punctuation code).
– DaveShaw
Nov 9 at 21:32




I didn't downvote, but the person who did, most likely did so because you're asking for someone to write some code without showing what you've tried. As it appears you're new to F#, I've given an answer that explains one way you would solve this problem thinking in a functional way. Hopefully this is a springboard to get you going with F# and solving problems. As F# runs on .NET, you can get inspiration from C# sometimes (like the Punctuation code).
– DaveShaw
Nov 9 at 21:32




1




1




Ah, okay. I should've said that. I already did try, I was able to split the string so it printed words instead of chars but I thought it would make my post my confusing writing my ugly code. I was also trying to use Regex.Escape to atleast count the occurances of a word, was looking at this: stackoverflow.com/questions/40385154/…
– jubibanna
Nov 9 at 21:59




Ah, okay. I should've said that. I already did try, I was able to split the string so it printed words instead of chars but I thought it would make my post my confusing writing my ugly code. I was also trying to use Regex.Escape to atleast count the occurances of a word, was looking at this: stackoverflow.com/questions/40385154/…
– jubibanna
Nov 9 at 21:59




1




1




It always helps to post what you have... That code is for when you know what you're looking for and want to count. It could be adopted to solve your problem, but I'd always prefer a groupBy to get a count, if you want to count all instances.
– DaveShaw
Nov 9 at 23:59




It always helps to post what you have... That code is for when you know what you're looking for and want to count. It could be adopted to solve your problem, but I'd always prefer a groupBy to get a count, if you want to count all instances.
– DaveShaw
Nov 9 at 23:59












2 Answers
2






active

oldest

votes

















up vote
3
down vote



accepted










Let's go through this step by step then, creating a function for each bit:




Convert words starting with an upper-case to a lower-case word so that all words are lower case.




Split the string into a sequence of words:



let getWords (s: string) = 
s.Split(' ')


Turns "hello world" into ["hello"; "world"]




Sort the amount of times a word is shown. A word in this sense is a sequence of characters without whitespaces or punctuation (!#= etc)




Part #1: Format a word in lower without punctuation:



let isNotPunctuation c = 
not (Char.IsPunctuation(c))

let formatWord (s: string) =
let chars =
s.ToLowerInvariant()
|> Seq.filter isNotPunctuation
|> Seq.toArray

new String(chars)


Turns "Hello!" into "hello".



Part #2: Group the list of words by the formatted version of it.



let groupWords (words: string seq) = 
words
|> Seq.groupBy formatWord


This returns a tuple, with the first part as the key (formatWord) the second part is a list of the words.



Turns ["hello"; "world"; "hello"] into



[("hello", ["hello"; "hello"]);
("world", ["world"])]



Sort from most frequent word shown and to less frequent word.




let sortWords group = 
group
|> Seq.sortByDescending (fun g -> Seq.length (snd g))


Sort the list descending (biggest first) by the length (count) of items in the second part - see the above representation.



Now we just need to clean up the output:



let output group =
group
|> Seq.map fst


This picks the first part of the tuple from the group:



Turns ("hello", ["hello"; "hello"]) into "hello".




Now we have all the functions, we can stick them together into one chain:



let s = "some long string with some repeated words again and some other words"

let finished =
s
|> getWords
|> groupWords
|> sortWords
|> output

printfn "%A" finished
//seq ["some"; "words"; "long"; "string"; ...]





share|improve this answer




















  • Wow! Thank you so much Dave, I actually learned quite a lot here! There's a lot I didn't know here so I will read your answer a couple of times again.
    – jubibanna
    Nov 9 at 21:38

















up vote
1
down vote













Here's another way using Regex



open System.Text.RegularExpressions

let str = "Some (very) long string with some repeated words again, and some other words, and some punctuation too."

str
|> (Regex @"W+").Split
|> Seq.choose(fun s -> if s = "" then None else Some (s.ToLower()))
|> Seq.countBy id
|> Seq.sortByDescending snd





share|improve this answer




















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53232716%2fcount-the-amount-of-times-a-word-is-shown-in-a-string%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    3
    down vote



    accepted










    Let's go through this step by step then, creating a function for each bit:




    Convert words starting with an upper-case to a lower-case word so that all words are lower case.




    Split the string into a sequence of words:



    let getWords (s: string) = 
    s.Split(' ')


    Turns "hello world" into ["hello"; "world"]




    Sort the amount of times a word is shown. A word in this sense is a sequence of characters without whitespaces or punctuation (!#= etc)




    Part #1: Format a word in lower without punctuation:



    let isNotPunctuation c = 
    not (Char.IsPunctuation(c))

    let formatWord (s: string) =
    let chars =
    s.ToLowerInvariant()
    |> Seq.filter isNotPunctuation
    |> Seq.toArray

    new String(chars)


    Turns "Hello!" into "hello".



    Part #2: Group the list of words by the formatted version of it.



    let groupWords (words: string seq) = 
    words
    |> Seq.groupBy formatWord


    This returns a tuple, with the first part as the key (formatWord) the second part is a list of the words.



    Turns ["hello"; "world"; "hello"] into



    [("hello", ["hello"; "hello"]);
    ("world", ["world"])]



    Sort from most frequent word shown and to less frequent word.




    let sortWords group = 
    group
    |> Seq.sortByDescending (fun g -> Seq.length (snd g))


    Sort the list descending (biggest first) by the length (count) of items in the second part - see the above representation.



    Now we just need to clean up the output:



    let output group =
    group
    |> Seq.map fst


    This picks the first part of the tuple from the group:



    Turns ("hello", ["hello"; "hello"]) into "hello".




    Now we have all the functions, we can stick them together into one chain:



    let s = "some long string with some repeated words again and some other words"

    let finished =
    s
    |> getWords
    |> groupWords
    |> sortWords
    |> output

    printfn "%A" finished
    //seq ["some"; "words"; "long"; "string"; ...]





    share|improve this answer




















    • Wow! Thank you so much Dave, I actually learned quite a lot here! There's a lot I didn't know here so I will read your answer a couple of times again.
      – jubibanna
      Nov 9 at 21:38














    up vote
    3
    down vote



    accepted










    Let's go through this step by step then, creating a function for each bit:




    Convert words starting with an upper-case to a lower-case word so that all words are lower case.




    Split the string into a sequence of words:



    let getWords (s: string) = 
    s.Split(' ')


    Turns "hello world" into ["hello"; "world"]




    Sort the amount of times a word is shown. A word in this sense is a sequence of characters without whitespaces or punctuation (!#= etc)




    Part #1: Format a word in lower without punctuation:



    let isNotPunctuation c = 
    not (Char.IsPunctuation(c))

    let formatWord (s: string) =
    let chars =
    s.ToLowerInvariant()
    |> Seq.filter isNotPunctuation
    |> Seq.toArray

    new String(chars)


    Turns "Hello!" into "hello".



    Part #2: Group the list of words by the formatted version of it.



    let groupWords (words: string seq) = 
    words
    |> Seq.groupBy formatWord


    This returns a tuple, with the first part as the key (formatWord) the second part is a list of the words.



    Turns ["hello"; "world"; "hello"] into



    [("hello", ["hello"; "hello"]);
    ("world", ["world"])]



    Sort from most frequent word shown and to less frequent word.




    let sortWords group = 
    group
    |> Seq.sortByDescending (fun g -> Seq.length (snd g))


    Sort the list descending (biggest first) by the length (count) of items in the second part - see the above representation.



    Now we just need to clean up the output:



    let output group =
    group
    |> Seq.map fst


    This picks the first part of the tuple from the group:



    Turns ("hello", ["hello"; "hello"]) into "hello".




    Now we have all the functions, we can stick them together into one chain:



    let s = "some long string with some repeated words again and some other words"

    let finished =
    s
    |> getWords
    |> groupWords
    |> sortWords
    |> output

    printfn "%A" finished
    //seq ["some"; "words"; "long"; "string"; ...]





    share|improve this answer




















    • Wow! Thank you so much Dave, I actually learned quite a lot here! There's a lot I didn't know here so I will read your answer a couple of times again.
      – jubibanna
      Nov 9 at 21:38












    up vote
    3
    down vote



    accepted







    up vote
    3
    down vote



    accepted






    Let's go through this step by step then, creating a function for each bit:




    Convert words starting with an upper-case to a lower-case word so that all words are lower case.




    Split the string into a sequence of words:



    let getWords (s: string) = 
    s.Split(' ')


    Turns "hello world" into ["hello"; "world"]




    Sort the amount of times a word is shown. A word in this sense is a sequence of characters without whitespaces or punctuation (!#= etc)




    Part #1: Format a word in lower without punctuation:



    let isNotPunctuation c = 
    not (Char.IsPunctuation(c))

    let formatWord (s: string) =
    let chars =
    s.ToLowerInvariant()
    |> Seq.filter isNotPunctuation
    |> Seq.toArray

    new String(chars)


    Turns "Hello!" into "hello".



    Part #2: Group the list of words by the formatted version of it.



    let groupWords (words: string seq) = 
    words
    |> Seq.groupBy formatWord


    This returns a tuple, with the first part as the key (formatWord) the second part is a list of the words.



    Turns ["hello"; "world"; "hello"] into



    [("hello", ["hello"; "hello"]);
    ("world", ["world"])]



    Sort from most frequent word shown and to less frequent word.




    let sortWords group = 
    group
    |> Seq.sortByDescending (fun g -> Seq.length (snd g))


    Sort the list descending (biggest first) by the length (count) of items in the second part - see the above representation.



    Now we just need to clean up the output:



    let output group =
    group
    |> Seq.map fst


    This picks the first part of the tuple from the group:



    Turns ("hello", ["hello"; "hello"]) into "hello".




    Now we have all the functions, we can stick them together into one chain:



    let s = "some long string with some repeated words again and some other words"

    let finished =
    s
    |> getWords
    |> groupWords
    |> sortWords
    |> output

    printfn "%A" finished
    //seq ["some"; "words"; "long"; "string"; ...]





    share|improve this answer












    Let's go through this step by step then, creating a function for each bit:




    Convert words starting with an upper-case to a lower-case word so that all words are lower case.




    Split the string into a sequence of words:



    let getWords (s: string) = 
    s.Split(' ')


    Turns "hello world" into ["hello"; "world"]




    Sort the amount of times a word is shown. A word in this sense is a sequence of characters without whitespaces or punctuation (!#= etc)




    Part #1: Format a word in lower without punctuation:



    let isNotPunctuation c = 
    not (Char.IsPunctuation(c))

    let formatWord (s: string) =
    let chars =
    s.ToLowerInvariant()
    |> Seq.filter isNotPunctuation
    |> Seq.toArray

    new String(chars)


    Turns "Hello!" into "hello".



    Part #2: Group the list of words by the formatted version of it.



    let groupWords (words: string seq) = 
    words
    |> Seq.groupBy formatWord


    This returns a tuple, with the first part as the key (formatWord) the second part is a list of the words.



    Turns ["hello"; "world"; "hello"] into



    [("hello", ["hello"; "hello"]);
    ("world", ["world"])]



    Sort from most frequent word shown and to less frequent word.




    let sortWords group = 
    group
    |> Seq.sortByDescending (fun g -> Seq.length (snd g))


    Sort the list descending (biggest first) by the length (count) of items in the second part - see the above representation.



    Now we just need to clean up the output:



    let output group =
    group
    |> Seq.map fst


    This picks the first part of the tuple from the group:



    Turns ("hello", ["hello"; "hello"]) into "hello".




    Now we have all the functions, we can stick them together into one chain:



    let s = "some long string with some repeated words again and some other words"

    let finished =
    s
    |> getWords
    |> groupWords
    |> sortWords
    |> output

    printfn "%A" finished
    //seq ["some"; "words"; "long"; "string"; ...]






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 9 at 20:51









    DaveShaw

    39.3k1088124




    39.3k1088124











    • Wow! Thank you so much Dave, I actually learned quite a lot here! There's a lot I didn't know here so I will read your answer a couple of times again.
      – jubibanna
      Nov 9 at 21:38
















    • Wow! Thank you so much Dave, I actually learned quite a lot here! There's a lot I didn't know here so I will read your answer a couple of times again.
      – jubibanna
      Nov 9 at 21:38















    Wow! Thank you so much Dave, I actually learned quite a lot here! There's a lot I didn't know here so I will read your answer a couple of times again.
    – jubibanna
    Nov 9 at 21:38




    Wow! Thank you so much Dave, I actually learned quite a lot here! There's a lot I didn't know here so I will read your answer a couple of times again.
    – jubibanna
    Nov 9 at 21:38












    up vote
    1
    down vote













    Here's another way using Regex



    open System.Text.RegularExpressions

    let str = "Some (very) long string with some repeated words again, and some other words, and some punctuation too."

    str
    |> (Regex @"W+").Split
    |> Seq.choose(fun s -> if s = "" then None else Some (s.ToLower()))
    |> Seq.countBy id
    |> Seq.sortByDescending snd





    share|improve this answer
























      up vote
      1
      down vote













      Here's another way using Regex



      open System.Text.RegularExpressions

      let str = "Some (very) long string with some repeated words again, and some other words, and some punctuation too."

      str
      |> (Regex @"W+").Split
      |> Seq.choose(fun s -> if s = "" then None else Some (s.ToLower()))
      |> Seq.countBy id
      |> Seq.sortByDescending snd





      share|improve this answer






















        up vote
        1
        down vote










        up vote
        1
        down vote









        Here's another way using Regex



        open System.Text.RegularExpressions

        let str = "Some (very) long string with some repeated words again, and some other words, and some punctuation too."

        str
        |> (Regex @"W+").Split
        |> Seq.choose(fun s -> if s = "" then None else Some (s.ToLower()))
        |> Seq.countBy id
        |> Seq.sortByDescending snd





        share|improve this answer












        Here's another way using Regex



        open System.Text.RegularExpressions

        let str = "Some (very) long string with some repeated words again, and some other words, and some punctuation too."

        str
        |> (Regex @"W+").Split
        |> Seq.choose(fun s -> if s = "" then None else Some (s.ToLower()))
        |> Seq.countBy id
        |> Seq.sortByDescending snd






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 10 at 7:56









        gileCAD

        1,28656




        1,28656



























             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53232716%2fcount-the-amount-of-times-a-word-is-shown-in-a-string%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Darth Vader #20

            How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

            Ondo