Fun with PowerShell: Deduplicating Records
Let's deduplicate the list of Avengers movies we got in the previous post.
In the previous post, we got a list of Avengers movies from the Open Movie Database and printed it onto the screen.
$movies = Invoke-RestMethod "http://www.omdbapi.com/?apikey=$key&s=Avengers"
$movies.search | Format-List
Title : The Avengers
Year : 2012
imdbID : tt0848228
Type : movie
Poster : https://m.media-amazon.com/images/M/MV5BNDYxNjQyMjAtNTdiOS00NGYwLWFmNTAtNThmYjU5ZGI2YTI1XkEyXkFqcGdeQXVyMTMxODk2OTU@._V1_SX300.jpg
Title : Avengers: Age of Ultron
Year : 2015
imdbID : tt2395427
Type : movie
Poster : https://m.media-amazon.com/images/M/MV5BMTM4OGJmNWMtOTM4Ni00NTE3LTg3MDItZmQxYjc4N2JhNmUxXkEyXkFqcGdeQXVyNTgzMDMzMTg@._V1_SX300.jpg
Title : Avengers: Infinity War
Year : 2018
imdbID : tt4154756
Type : movie
Poster : https://m.media-amazon.com/images/M/MV5BMjMxNjY2MDU1OV5BMl5BanBnXkFtZTgwNzY1MTUwNTM@._V1_SX300.jpg
Title : The Avengers
Year : 1998
imdbID : tt0118661
Type : movie
Poster : https://m.media-amazon.com/images/M/MV5BYWE1NTdjOWQtYTQ2Ny00Nzc5LWExYzMtNmRlOThmOTE2N2I4XkEyXkFqcGdeQXVyNjUwNzk3NDc@._V1_SX300.jpg
Title : The Avengers: Earth's Mightiest Heroes
Year : 2010–2012
imdbID : tt1626038
Type : series
Poster : https://m.media-amazon.com/images/M/MV5BYzA4ZjVhYzctZmI0NC00ZmIxLWFmYTgtOGIxMDYxODhmMGQ2XkEyXkFqcGdeQXVyNjExODE1MDc@._V1_SX300.jpg
Title : Ultimate Avengers
Year : 2006
imdbID : tt0491703
Type : movie
Poster : https://m.media-amazon.com/images/M/MV5BNDFmZTkxMjktMzRiYS00YzMwLWFhZDctOTQ2N2NlOTAyZDJhXkEyXkFqcGdeQXVyNjgzNDU2ODI@._V1_SX300.jpg
Title : Ultimate Avengers II
Year : 2006
imdbID : tt0803093
Type : movie
Poster : https://m.media-amazon.com/images/M/MV5BZjI3MTI5ZTYtZmNmNy00OGZmLTlhNWMtNjZiYmYzNDhlOGRkL2ltYWdlL2ltYWdlXkEyXkFqcGdeQXVyNTAyODkwOQ@@._V1_SX300.jpg
Title : The Avengers
Year : 1961–1969
imdbID : tt0054518
Type : series
Poster : https://m.media-amazon.com/images/M/MV5BZWI4ZWM4ZWQtODk1ZC00MzMxLThlZmMtOGFmMTYxZTAwYjc5XkEyXkFqcGdeQXVyMTk0MjQ3Nzk@._V1_SX300.jpg
Title : Avengers Assemble
Year : 2013–
imdbID : tt2455546
Type : series
Poster : https://m.media-amazon.com/images/M/MV5BMTY0NTUyMDQwOV5BMl5BanBnXkFtZTgwNjAwMTA0MDE@._V1_SX300.jpg
Title : Avengers Assemble
Year : 2013–
imdbID : tt2455546
Type : series
Poster : https://m.media-amazon.com/images/M/MV5BMTY0NTUyMDQwOV5BMl5BanBnXkFtZTgwNjAwMTA0MDE@._V1_SX300.jpg
This is great, but the real power of PowerShell comes from the ability to manipulate these results. Let's extract just the title from each element of the list.
$movies.Search | foreach Title
The Avengers
Avengers: Age of Ultron
Avengers: Infinity War
The Avengers
The Avengers: Earth's Mightiest Heroes
Ultimate Avengers
Ultimate Avengers II
The Avengers
Avengers Assemble
Avengers Assemble
Pretty neat, but I spot a couple of duplicate. Let's take a look at the raw data again.
$movies.Search | select title, year, imdbID
Title Year imdbID
----- ---- ------
The Avengers 2012 tt0848228
Avengers: Age of Ultron 2015 tt2395427
Avengers: Infinity War 2018 tt4154756
The Avengers 1998 tt0118661
The Avengers: Earth's Mightiest Heroes 2010–2012 tt1626038
Ultimate Avengers 2006 tt0491703
Ultimate Avengers II 2006 tt0803093
The Avengers 1961–1969 tt0054518
Avengers Assemble 2013– tt2455546
Avengers Assemble 2013– tt2455546
The select
command (a shorthand for Select-Object
) allows us to pare down the data we're getting so we can take a closer look. It looks like we have two exact copies of Avengers Assemble
(with the same imdbID) but three different movies called The Avengers
.
What we want to do is to dedupe the list of movies based on their imdbID
. The sort
command (shorthand for Sort-Object
) has exactly what we need. We can sort a list of objects by some property in the object, and if we pass the -uniq
argument to sort
, it will eliminate all but the first copy of the object.
$movies.Search | sort imdbId -uniq | select title, year, imdbId
Title Year imdbID
----- ---- ------
The Avengers 1961–1969 tt0054518
The Avengers 1998 tt0118661
Ultimate Avengers 2006 tt0491703
Ultimate Avengers II 2006 tt0803093
The Avengers 2012 tt0848228
The Avengers: Earth's Mightiest Heroes 2010–2012 tt1626038
Avengers: Age of Ultron 2015 tt2395427
Avengers Assemble 2013– tt2455546
Avengers: Infinity War 2018 tt4154756
Pretty cool so far, but what if we want to convert the result to JSON?
No problem! The ConvertTo-Json
command will convert the objects to JSON for us.
$movies.Search | sort imdbId -uniq | select title, year, imdbId | ConvertTo-Json
[
{
"Title": "The Avengers",
"Year": "1961–1969",
"imdbID": "tt0054518"
},
{
"Title": "The Avengers",
"Year": "1998",
"imdbID": "tt0118661"
},
{
"Title": "Ultimate Avengers",
"Year": "2006",
"imdbID": "tt0491703"
},
{
"Title": "Ultimate Avengers II",
"Year": "2006",
"imdbID": "tt0803093"
},
{
"Title": "The Avengers",
"Year": "2012",
"imdbID": "tt0848228"
},
{
"Title": "The Avengers: Earth's Mightiest Heroes",
"Year": "2010–2012",
"imdbID": "tt1626038"
},
{
"Title": "Avengers: Age of Ultron",
"Year": "2015",
"imdbID": "tt2395427"
},
{
"Title": "Avengers Assemble",
"Year": "2013–",
"imdbID": "tt2455546"
},
{
"Title": "Avengers: Infinity War",
"Year": "2018",
"imdbID": "tt4154756"
}
]
You might be asking: how are you supposed to find the ConvertTo-Json
command? The cool thing about PowerShell is that commands are named, conventionally, as <Verb>-<Noun>
. As a result, PowerShell comes with a command Get-Command
that lets you query all of the commands for the ones you're looking for.
Get-Command -Noun Json
CommandType Name Version Source
----------- ---- ------- ------
Cmdlet ConvertFrom-Json 6.1.0.0 Microsoft.PowerShell.Utility
Cmdlet ConvertTo-Json 6.1.0.0 Microsoft.PowerShell.Utility
Cmdlet Test-Json 6.1.0.0 Microsoft.PowerShell.Utility
One pretty cool thing about Get-Command
: because it returns an array of objects like any other PowerShell command, we can use what we already learned to tone down the noise of this list.
Get-Command -Noun Json | select Name
Name
----
ConvertFrom-Json
ConvertTo-Json
Test-Json
We already covered the fact that you can get help about any command by using -?
or Get-Help
. But on top of printing out the help in your console, you can open up the help in your default web browser.
Get-Help ConvertTo-Json -Online
Opens up your default web browser at the URL of the documentation for the help.
Shorthand vs. Longhand
Now that we've seen Get-Command -Noun
and Get-Command -Verb
, we can understand why PowerShell's commands can seem so verbose, but also why we wouldn't want the PowerShell designers to just stick with the shorthand names.
Let's take a look at all of the commands that work with objects that have aliases:
ForEach-Object
Where-Object
Compare-Object
Compare-Object
ForEach-Object
Group-Object
Measure-Object
Select-Object
Sort-Object
Tee-Object
Where-Object
How did I get that list?
Get-Alias |
foreach ResolvedCommand |
where Noun -eq "object" |
foreach name
First, I asked PowerShell for a list of all of the aliases in the system. Next, I mapped the aliases over the ResolvedCommand
property. Next, I restricted the results to the resolved commands whose noun was Object
, and extracted the name property from each of the matching resolved commands.
How did I figure that out?
PS1> Get-Alias
CommandType Name Version Source
----------- ---- ------- ------
Alias % -> ForEach-Object
Alias ? -> Where-Object
Alias ac -> Add-Content
Alias algm -> Add-LocalGroupMember 1.0.0.0 Microsoft.PowerShell.LocalAccounts
Alias cat -> Get-Content
Alias cd -> Set-Location
...
PS1> # look for members inside of the alias object that has something to do
# with a command
PS1> Get-Alias | Get-Member | where Name -like "*command*"
Name MemberType Definition
---- ---------- ----------
CommandType Property System.Management.Automation.CommandTypes CommandType {get;}
ReferencedCommand Property System.Management.Automation.CommandInfo ReferencedCommand {get;}
ResolvedCommand Property System.Management.Automation.CommandInfo ResolvedCommand {get;}
ResolvedCommandName ScriptProperty System.Object ResolvedCommandName {get=$this.ResolvedCommand.Name;}
PS1> # map over the resolved commands, which give us back the same
# thing as Get-Command, which we used before, so we know how to
# filter it by Noun. Let's map over noun, just to be sure. We'll
# use `select -first`, an important feature we haven't covered yet
# to avoid a massive amount of spew, and then pipe into ConvertTo-Json
# to make the output clear.
PS1> Get-Alias | foreach ResolvedCommand | foreach Noun | select -first 10 | ConvertTo-Json
[
"Object",
"Object",
"Content",
"PSSnapin",
"Content",
"Location",
"String",
"Location",
"Content",
"Host"
]
Looks good! We can now use where
and foreach
to finish the job.
Because I was lazy, I also used PowerShell to create the bulleted list to paste into my blog software.
Get-Alias |
foreach ResolvedCommand |
where Noun -eq "object" |
foreach name |
foreach { "* ``$psitem``" }
* `ForEach-Object`
* `Where-Object`
* `Compare-Object`
* `Compare-Object`
* `ForEach-Object`
* `Group-Object`
* `Measure-Object`
* `Select-Object`
* `Sort-Object`
* `Tee-Object`
* `Where-Object`
In general, the aliases are much prettier:
% -> ForEach-Object
? -> Where-Object
compare -> Compare-Object
diff -> Compare-Object
foreach -> ForEach-Object
group -> Group-Object
measure -> Measure-Object
select -> Select-Object
sort -> Sort-Object
tee -> Tee-Object
where -> Where-Object
How'd I get this list?
```powershell Get-Alias | where { $psitem.ResolvedCommand.Noun -eq "object" } | foreach DisplayName | foreach { "* ``$_``" } ```When working interactively in the shell, the short names are really great. But the longer canonical names follow the <Verb>-<Noun>
convention which makes them more discoverable. Generally speaking, long-time PowerShell authors also find the longer names to be more readable when writing scripts that will need to be maintained.
Bottom line: both the longhand and the shorthand have their place, and you will probably find yourself using the shorthand versions almost exclusively when working interactively in the command-line.
Bonus Points: Digging Into the Command
One last thing: even though Get-Command
gives us a simplified table of the commands that matched, there's way more information inside.
Get-Command -Name ConvertTo-Json | Get-Member
TypeName: System.Management.Automation.CmdletInfo
Name MemberType Definition
---- ---------- ----------
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ResolveParameter Method System.Management.Automation.ParameterMetadata ResolveParameter(string name)
ToString Method string ToString()
CommandType Property System.Management.Automation.CommandTypes CommandType {get;}
DefaultParameterSet Property string DefaultParameterSet {get;}
Definition Property string Definition {get;}
HelpFile Property string HelpFile {get;}
ImplementingType Property type ImplementingType {get;}
Module Property psmoduleinfo Module {get;}
ModuleName Property string ModuleName {get;}
Name Property string Name {get;}
Noun Property string Noun {get;}
Options Property System.Management.Automation.ScopedItemOptions Options {get;set;}
OutputType Property System.Collections.ObjectModel.ReadOnlyCollection[System.Management.Automation.PSTypeName] OutputType {get;}
Parameters Property System.Collections.Generic.Dictionary[string,System.Management.Automation.ParameterMetadata] Parameters {get;}
ParameterSets Property System.Collections.ObjectModel.ReadOnlyCollection[System.Management.Automation.CommandParameterSetInfo] ParameterS…
PSSnapIn Property System.Management.Automation.PSSnapInInfo PSSnapIn {get;}
RemotingCapability Property System.Management.Automation.RemotingCapability RemotingCapability {get;}
Source Property string Source {get;}
Verb Property string Verb {get;}
Version Property version Version {get;}
Visibility Property System.Management.Automation.SessionStateEntryVisibility Visibility {get;set;}
DLL ScriptProperty System.Object DLL {get=$this.ImplementingType.Assembly.Location;}
HelpUri ScriptProperty System.Object HelpUri {get=$oldProgressPreference = $ProgressPreference…
Armed with that information, we can give ourselves a better table of information about the JSON
facilities that come with PowerShell.
Get-Command -Noun Json | select Name, Noun, Verb, HelpUri
Name Noun Verb HelpUri
---- ---- ---- -------
ConvertFrom-Json Json ConvertFrom https://go.microsoft.com/fwlink/?LinkID=217031
ConvertTo-Json Json ConvertTo https://go.microsoft.com/fwlink/?LinkID=217032
Test-Json Json Test https://msdn.microsoft.com/en-us/library/bb299886.aspx
By now you should be getting the picture: since PowerShell works with collections of objects rather than text files, you can use your basic knowledge on any kind of collection you come across.
As your fundamental skills improve, you'll be able to manipulate not only JSON documents, but also processes, files, and even functions and aliases in the same way!