Description
No template file is provided for this homework. Download the markets.json file from the website, and make your HW06.hs Haskell file with your name, any sources you consulted, and any other relevant comments (just like in previous assignments). Then, say {-# LANGUAGE DeriveGeneric, OverloadedStrings #-} module HW06 where
import Data.Aeson import Data.Monoid import GHC.Generics
import qualified Data.ByteString.Lazy.Char8 as B import qualified Data.Text as T import qualified Data.Text.IO as T
and off you go. Do make sure to include this module header, as it makes grading much easier for us.
Preface Setup
You will need two packages that are not part of Haskell’s standard library for this assignment. They are aeson and text. You can install
these with cabal update; cabal install aeson text.1 If you have 1 The cabal update part is to make sure
GHCi open, you will need to restart GHCi to use these downloaded libraries.
Generics
You will see the language extension DeriveGeneric in the module you download the most recent versions of these packages.
header given above. This allows you to name the class Generic2 in a deriving clause when declaring a new datatype. You will not 2 imported from GHC.Generics
use any of the methods or other features of the Generic class.3 The reason to derive Generic is for easy interoperability with the aeson JSON-parsing library. A derived Generic instance encodes various features of a datatype (such as its constructor names, any record-field names, etc.) that advanced Haskellers can (such as the authors of aeson) use to make your life easier. 3 Feel free to look it up!
JSON files
This homework centers around parsing and then querying information stored in a JSON file. JSON is a standardized data interchange format that is easy to read and easy to write. See json.org for the details, but you won’t need to know about details for this assignment.
Instead, the aeson library does all the work for you!
What you do have to worry about is making sure that your Haskell program can find your markets.json file. Putting the file in the same directory as your HW06.hs file is a great start, but it’s not always enough. If you’re having trouble getting your code to find your file, and you’re using GHCi, try running :!pwd. That will print out the current directory GHCi thinks it’s in. (The :! prefix allows you to run arbitrary shell commands within GHCi.) If markets.json isn’t
there, either move it there, or use :cd to move GHCi.4
String theory
Haskell’s built-in String type is a little silly. Sure, it’s programmatically convenient to think of Strings as lists of characters, but that’s a terrible, terrible way to store chunks of text in the memory of a computer. Depending on an application’s need, there are several other representations of chunks of text available. This assignment will need two other representations: ByteString and Text.
The ByteString library helpfully (?) uses many of the same names 4 :cd is a GHCi command. The missing ! is intentional!
for functions as the Prelude and Data.List. If you just import Data.ByteString, you’ll get a ton of name clashes in your code. Instead, we use import qualified … as B, which means that every use of a ByteString function (including operators) or type must be preceded by B.. Thus, to get the length of a ByteString, you use B.length. Even to mention the type ByteString, you must use B.ByteString.
ByteStrings come in several flavors, depending on whether they are lazy or strict and what encoding they use internally. Laziness is a story for another day, and we really don’t want to worry about encodings. For now, use Data.ByteString.Lazy.Char8 and everything will work out nicely.
When working with non-String strings, it is still very handy to use the “…” syntax for writing Text or ByteString values. So, GHC provides the OverloadedStrings extension. This works quite similarly to overloaded numbers, in that every use of “blah” becomes a call to fromString “blah”, where fromString is a method in the IsString type class. Values of any type that has an instance of IsString can then be created with the “…” syntax. Of course,
ByteString and Text are both in the IsString class, as is String.
http://www.bayoucityoutdoors.com/ClubPortal/ClubStatic.cfm?clubID=3&pubmenuoptID=11318
The markets.json file you have downloaded contains information about many (all?) of the Farmers’ Markets that regularly take place throughout the USA, originally retrieved via http://catalog.
data.gov/dataset/farmers-markets-geographic-data. That website produces the data in an Excel spreadsheet. I converted the spreadsheet to a comma-separated-values format (CSV) and then used http://www.convertcsv.com/csv-to-json.htm to get it into a JSON format. I chose JSON because the aeson JSON parser is more advanced yet easier to use than the CSV parser package, cassava.5 5 I am giving you these details in case Take a look at the data by opening up the file in a text editor. you want to look at other datasets. You’ll see that each market entry has many different fields, all with distinct names but of different types.
instead of “Y” and “N”.
One aeson function that parses JSON is called eitherDecode: eitherDecode :: FromJSON a => ByteString -> Either String a
The FromJSON type class is also exported by the aeson package7 Its one method parseJSON :: Value -> Parser a (which you will not have to write in this assignment) says how to parse from JSON to the class type a. Thus, anything in the FromJSON type class can be parsed from a JSON file. Of course, parsing can fail, so eitherDecode returns either the desired value or a String containing error information.
A useful member of the FromJSON type class is Value. Value represents JSON syntax in a Haskell type. Check out its documenta- Hoogle does not search the aeson package by default, so you will have to access the package documentation on Hackage. Try this URL: http: //hackage.haskell.org/package/ aeson-0.8.0.0. There is a newer version uploaded (0.8.0.1), but for some
reason, the documentation isn’t there.7 In case you haven’t noticed, I’m using
“package” and “library” interchangeably.
tion.8 A JSON Value can be one of six things: an object (something 8 Ignore the !s in the documentation.
in braces; a mapping from key names to other values), an array (something in brackets; a listing of JSON values), some text, a number, a Boolean value, or the special constant null. Look a little further down in the documentation to see the definitions for the types
Object and Array.
An Object is a HashMap Text Value — that is, a way to get Values indexed by some Text. However, the details of HashMap aren’t important at all for you. What is critically important is that there is an instance Functor (HashMap k). That means that a valid type for fmap is (a -> b) -> HashMap k a -> HashMap k b.
An Array is a Vector Value. Vector is a type quite like normal They are strictness annotations, which are a story for another day. They don’t affect you at all here.
lists but uses a different internal representation.9 Some operations on 9 Haskell lists are linked lists; Vectors
Vectors are faster than for lists; some are slower. However, the details of Vector aren’t important at all for you. What is critically important is that there is an instance Functor Vector. That means that a valid are arrays.
First, notice that the many Boolean values in the file are labeled with “Y” or “N”. As smart as aeson is, it doesn’t know that “Y” corresponds to True and “N” corresponds to False. Your first task is to construct a Value6 that has been adjusted to have proper Booleans
6 Value is a type from the aeson library.
type for fmap is (a -> b) -> Vector a -> Vector b.
Exercise 1 Write a function ynToBool :: Value -> Value
that changes all occurrences of String “Y” to be Bool True and all occurrences of String “N” to be Bool False. No other part of the input Value should change.
Exercise 2 Write a function parseData :: B.ByteString -> Either String Value
that takes in a ByteString containing JSON data and outputs either an error message or a Value that has been processed by ynToBool.
Hint: This can be very short, if you use Either’s Functor instance!
You can easily test this by saying, for example, filedata <- B.readFile “markets.json”
in GHCi and then calling parseData on filedata.
The Market type
If you define a datatype Market appropriately, include deriving Generic, and say instance FromJSON Market with no where clause, you get an automatic parser for your datatype. For example, if you have
data Person = Person { name :: String
, age :: Int }
deriving (Show, Generic)
instance FromJSON Person
p :: Either String Person
p = eitherDecode “{ “name” : “Richard”, “age” : 32 }”
You get that p == Right (Person “Richard” 32).10 The aeson
10 The extra backslashes in the string above are because the string must
library uses the field names in the Person record (see the lecture notes about record notation) to figure out what the JSON tags should be. The order doesn’t matter – aeson really is using the names.
Exercise 3 Write a Market type, including the fields that interest you. At a minimum, include marketname, x (the longitude of the market), y (the latitude of the market), and state. Use T.Text to represent text. (String also works, but is less efficient.) Then, write a function parseMarkets :: B.ByteString -> Either String [Market]
that uses parseData and fromJSON (from the aeson package) to parse in the list of markets in the file.
Exercise 4 Write an I/O action loadData :: IO [Market]
that loads the market data. In the event of a parsing failure, report the error using fail :: String -> IO a. (fail aborts an action, reporting an error to the user. It never returns, so it can be used no matter what IO type is expected. That’s why it returns type IO a, for any a.)
Once this is defined, you can get your market data by saying mkts <- loadData in GHCi.
contain quotes, and Haskell’s way of putting quotes in strings is to use backslashes like you see here.
Interlude: an ordered-list monoid
Before we get to actually searching the loaded market data, it will be helpful to defined an monoid for an ordered list. An ordered list, which we’ll call OrdList, is a wrapper around lists (in the tradition of First, Last, Sum, and Product, all from the Data.Monoid module) that defines a Monoid instance which keeps the list ordered, from least to greatest. For example:
data OrdList a = OrdList { getOrdList :: [a] } deriving (Eq, Show)
— include this datatype in your code!
instance Ord a => Monoid (OrdList a) where …
— you’ll need to fill in the ellipses
evens :: OrdList Integer evens = OrdList [2,4,6]
odds :: OrdList Integer odds = OrdList [1,3,5]
combined :: OrdList Integer combined = evens <> odds
Now, combined should have the value OrdList [1,2,3,4,5,6], because the (<>) operator maintains the ordering invariant.
Exercise 5 Write the OrdList datatype and its Monoid instance.
Searching with Monoids
Now that you have a way of loading the market data, you need a way of searching through that data. Furthermore, it would be nice if the search mechanism is flexible enough to produce data in a Monoid of the caller’s choice. Although there are, I’m sure, other queries you might want to do, all of our queries are going to center on searching for a market’s name (the marketname field).
Throughout this section, we will be searching among the markets returning a variety of types. To avoid code repetition, it is helpful to use a type synonym. Include the following in your code: type Searcher m = T.Text -> [Market] -> m Thus, a Searcher m is a function that, when given the T.Text to search for in a [Market], will produce an m.
Exercise 6 Write a function search :: Monoid m => (Market -> m) -> Searcher m
that searches through the provided list of Markets for market names containing the given T.Text (Data.Text.isInfixOf will be useful here). With each found market record, use the function provided to convert the market record into the monoid of the caller’s choice, and then combine all the individual results using mappend and mconcat.
marketWithName :: Market -> (T.Text, Market)
marketWithName mkt@(Market { marketname = name }) = (name, mkt)
Note that mkt is matched against the whole market record, while the pattern-match also binds name to the market’s name. The name is just one field in the record matched by mkt.
Exercise 7 Write a function firstFound :: Searcher (Maybe Market)
that returns the first market found by a search, if any are found at all. Like in the case for search, above, your firstFound function can be given arguments, even though the type looks like there should be no arguments. Unlike search, though, this one is definable without taking any arguments, with the right call to search.
compose2 :: (c -> d) -> (a -> b -> c) -> a -> b -> d compose2 = (.) . (.)
Exercise 8 Write a function lastFound :: Searcher (Maybe Market)
that returns the last market found by a search, if any are found at all.
Exercise 9 Write a function
allFound :: Searcher [Market]
that returns all the markets found by a search.
Exercise 10 Write a function numberFound :: Searcher Int
that returns the number of markets found by a search.
Exercise 11 Write a function orderedNtoS :: Searcher [Market]
tht returns all the markets found by a search, ordered from northernmost to southernmost. You will need a wrapper around Market to choose an appropriate Ord instance. This exercise doesn’t take too much code, but getting the structure right is intended to be a challenge.
Exercise 12 (Optional) Now that you’ve built the infrastructure to do queries on this dataset, see if you can find an interesting detail in the data. It should be fairly easy at this point to make a variety of queries. Tell us something we didn’t know about farmer’s markets!
Reviews
There are no reviews yet.